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PREFACE 


This book is an outgrowth of the two parts of Mathematics of Statistics by 
John F. Kenney and myself (Van Nostrand, Part I, 3rd ed., 1954; Part II, 
2nd ed., 1951). It seemed advisable to prepare a new textbook which would 
emphasize the inferential and decision-making aspects of statistics and which 
would assume a mathematical background on the part of the reader roughly 
intermediate between those of Parts I and II. 

Part I has for several years been used as a textbook for first-year students 
lacking any previous knowledge of calculus but with a fairly good background 
of high school or junior college algebra. Part II, on the other hand, pre- 
Supposes at least two years of calculus with a corresponding mathematical 
maturity, and is primarily intended for senior or even graduate students desir- 
inga deeper understanding of statistical principles. The present work requires 
a knowledge of elementary calculus and is adapted to the usual third-year 
university level in mathematics. The first chapter contains some of the ele- 
ments of set theory, but this material is being more and more widely taught 
nowadays, even in quite junior courses, and is almost essential for the under- 
standing of the idea of probability. 

Throughout the book proofs are given where possible, but in many instances 
the mathematical details are relegated to the Appendix. There also will be 
found brief treatments of topics which the student is unlikely to have en- 
countered in his regular courses—the gamma and beta functions, Stirling’s 
approximation, Jacobians, Bernoulli numbers, etc. Although matrix algebra 
is nowadays much more prominent than formerly, and is invading even fresh- 
man courses, we have included in the Appendix enough of the elements of 
this subject to permit the occasional use of matrix notation in the body of the 
book. The brevity and convenience of this notation make it well worth while 
for the student of statistics, at least in the more advanced parts of the subject, 
to spend a little time on mastering the necessary algebra. 

The present book was planned as another joint effort by Professor Kenney 
and myself. However, as the work proceeded, the bulk of the writing was left 
to me, and Professor Kenney eventually decided that his name ought not in 
fairness to be attached to it. I am much indebted to him for his generous 
action and for his continuing advice and criticism throughout the period of 
Preparation. In common with many others of my generation, I learned the 


elements of statistics from Kenney's Mathematics of Statistics, Parts I and II, 
v 


е 


У PREFACE 


and I value very highly the privilege of having collaborated with him in the 
later revised editions of these books. 

Since the concept of probability is fundamental to statistical inference, the 
first chapter is concerned mainly with the elements of the calculus of prob- 
ability. The treatment is heuristic rather than rigorous, but does attempt to 
give a reasonable foundation of the idea of probability as a measure, and to 
interpret probability objectively in terms of relative frequencies and subjec- 
tively in terms of betting odds. 

The second chapter contains the essential statistical techniques of summar- 
izing the data in a sample prior to making inferences about the population. 
The routine computations of mean, variance, median, etc., are described for 
the benefit of those students without any previous statistical training. The 
general properties of distributions, with their cumulants and cumulant gen- 
erating functions, are then discussed, and illustrated by reference to a number 
of special probability distributions (binomial, Poisson, normal, gamma and 
beta, chi-square, log-normal). This leads up to the relation between a sample 
and the population from which it is drawn and the concepts of confidence 
intervals and fiducial inference. 

In Chapter 6 the principles of testing hypotheses and making decisions with 
assigned risks of error are introduced. The method of maximum likelihood 
and the concept of the power of a test are dealt with. All this is crucial in any 
discussion of statistical inference. 

After a treatment of different sampling procedures, including sequential 
methods, the usual exact statistical tests on samples from a normal population 
are discussed. This is followed by the analysis of variance (which also as- 
sumes normality as generally applied) and by a discussion of certain non- 
parametric methods which can be used when it is unsafe to postulate a normal 
population. 

Bivariate (linear regression, correlation and contingency) problems are next 
dealt with, followed in Chapter 12 by non-linear regression and curve-fitting. 
Finally there is a short chapter on multivariate problems and stochastic proc- 
esses, giving only the barest introduction to these extensive fields. 

Sets of problems are included at the end of each chapter. These are 
arranged in groups according to the sections in the chapter to which they 
relate. Within each group the problems are roughly in order of difficulty. 
An attempt has been made to maintain a balance between numerical examples 
and questions on pure theory. For some of the numerical problems it is very 
desirable to have the use of a desk computer; everyone who has much to do 
with statistics is almost bound to acquire facility in the use of such computing 
machines. Much can be done with even so inexpensive and compact a device 
as the little pocket “Curta” calculator. Hints are provided for the solution of 
the more difficult problems. I am again grateful to Professor Kenney for 
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permission to include a number of problems taken from Parts I and II of our 
joint work already mentioned. 

The tables given in Appendix B should suffice for most statistical tests, but 
the student should if possible have access to more complete sets of tables, 
such as Pearson and Hartley's Biometrika Tables for Statisticians, Vol. I 
(Cambridge University Press, 1954) or Fisher and Yates' Statistical Tables 
for Biological, Agricultural and Medical Research (Oliver and Boyd, 5th ed., 
1957). 

For permission to reprint tables or portions of tables, I am indebted to the 
following: Sir Ronald A. Fisher, F.R.S., and Messrs. Oliver and Boyd, Ltd., 
Edinburgh (Tables B.3 and B.4); Professor M. S. Bartlett and the Department 
of Statistics, University College, London (Table B.1); Dr. A. J. Jonckheere, 
Professor E. S. Pearson, and the publishers of “Biometrika” (Table B.10, 8.3 
and 8.5); Dr. D. Auble and the Institute for Educational Research, Indiana 
University (Table B.9); Dr. G. W. Snedecor and the Iowa State University 
Press (Table B.5); Dr. F. J. Massey and the American Statistical Association 
(Table B.6); Dr. J. E. Walsh and the American Statistical Association (Table 
B.8); Dr. F. Wilcoxon and the American Cyanamid Company (Table 10.8). 

A list of references is given at the end of each chapter, but this list is not 
intended as even a partial bibliography. It serves merely to indicate to the 
student a few books or papers in which he may find a fuller treatment, or more 
detailed proofs, of some of the statements in the text, and also in a few cases 
to give the source of the numerical data used in problems. 

In so vast a subject as modern mathematical statistics a textbook writer has 
to be selective. It is highly probable that some statisticians, looking at this 
book, will feel that the emphasis is misplaced here and there or that a better 
choice of topics could have been made. Ican only plead that to me the choice 
has seemed reasonable for the type of student I had in mind. 

It is hardly possible for me to express adequately my indebtedness to all 
the teachers and writers from whose lectures or papers I have derived help and 
inspiration. I am grateful also to the publishers’ readers who examined this 
book in manuscript and offered valuable suggestions for improvement. In 
conclusion, I should like to express my appreciation of the help of Mrs. I. Maj, 
who coped most efficiently with the job of typing a manuscript plentifully 


sprinkled with mathematical symbols. 
E. S. K. 
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FOREWORD 


This book contains sufficient material for a two-semester or 
full-year course, with three lecture periods per week. Some 
less important sections, which might be omitted on a first 
reading, are starred. 

For a one-semester course, it would be advisable to read 
most of Chapters 1 to 6 (omitting the starred sections) and 
also the first parts of Chapters 8 and 11. The instructor will 
naturally have the responsibility of deciding on the material 
which he considers most relevant to the needs of his particular 
students. 

References to numbered equations within the same section 
are to the last part of the number only. Thus Eq. (1.10.6) 
if referred to within §1.10, would be quoted as Eq. (6). If 
referred to in any later section, the complete number is 
quoted. 

Numbers enclosed in square brackets, such as [2], refer to 


the literature references given at the end of the chapter. 
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Chap ter 1 à 
PROBABILITY 


1.1 Uncertainty of Statistical Inference In everyday life we are again and 
again faced with the necessity of making decisions. Many of these are trivial, 
some may be serious, but almost always there is an element of uncertainty about 
the wisdom of the decision we make. Scientists in their regular work have a 
similar problem. They have to draw conclusions from their enquiries or experi- 
mental results, but their observations are liable to error and may from their very 
nature be subject to considerable irregular fluctuations. Any conclusions that 
the scientists draw will therefore not be rigid and unalterable, but will merely be 
more or less probable. The theory of probability is the groundwork of scientific 
inference, and as such is the subject of this chapter. 

Statistics is concerned with variables that fluctuate in a more or less unpre- 
dictable way, such as the monthly total of highway fatalities in the state of New 
York or the yearly average yield of wheat in bushels per acre on a Saskatchewan 
farm. There may be assignable causes, with predictable effects on the total of fatal 
accidents in a particular month, but no one would expect to be able, month after 
month, to predict the total exactly. The essene? of a statistical variable is that, 
to some extent at least, it is unpredictable. We call this characteristic randomness 
and will later give it a more precise definition. = а "Ae 

Since in almost all experimental work, and particularly in the biological and 
SOcial sciences, the results are influenced by a variety of conditions largely 
beyond the experimenter's control, there is always this element of randomness 
about the results of experiment. Variations in rainfall and soil composition affect 
Plant yields; individual peculiarities affect the behavior of rats or guinea pigs. 


Even in the “exact” sciences, such as physics and astronomy, with their relatively 
high precision of measurement, there is still a residuum of unavoidable experi- 
mental error. It is the task of statistical inference to draw valid conclusions 


about the world around us from such limited and imperfect observations as we 
can make. Since these conclusions are not certain, we would like to attach to 
them an estimate of the probability of their truth. How this can be done in 
Certain types of problems will be told in later chapters. * iri 
Probability has in modern atomic physics an even eeper significance. The 
“principle of indeterminacy,” formulated originally by и ihe 
down as a cardinal truth of physics that sets pairs; ЦА sOn Bd 
POsition and momentum, or energy and time, cannot both be measured, even in 
Principle, with unlimited precision, but thatthe more — Оле at theipair 
is known, the less accurately can the other be determined. This has nothing to do 
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with the limitations and errors of actual physical apparatus—it is a theoretical 
limitation that would hold with perfect apparatus. A consequence of this 
principle is that the variable representing a particle in modern quantum theory 
is something that, as far as it can be represented physically at all, is a probability 
—the probability, roughly speaking, that the particle is at a certain position in 
space at a certain time. 


1.2 Intuitive Idea of Probability Everyone has a general idea of what is 
meant by the words “probable” and "probability." We hear the radio announcer 
at breakfast-time say that it will probably rain before the afternoon, and we 
decide to wear a raincoat to the office. It is, however, by no means a simple 
matter to give a definition of probability that will adequately cover all cases and 
serve as a satisfactory foundation for statistical inference. 

Some people feel that probability refers only to a state of mind, the strength 
of one's belief in a proposition. In order to make probability more than merely 
subjective, these writers have to speak of the degree of "rational" belief in a 
proposition, something that should perhaps be called credibility rather than 
probability. By making some rather arbitrary assumptions it is possible to 
arrive at a numerical calculus of probabilities, but we shall not for the present 
pursue this line of thought any further. (See references [1] and [2] at theend of the 
chapter, and also 8 1.8). 

The mathematical treatment of probability arose historically out of dis- 
cussions of games of chance in the 17th century (see [3], [4], [5]). If a die seems to 
be honest and well made, it is reasonable to suppose that it is equally likely to 
fall, if rolled in the customary way, with any of its six faces uppermost. This 
judgment merely involves a recognition of the fundamental symmetry of the die. 
It is not perfectly symmetrical, of course, since the faces are marked differently, 
but we judge that this minimal lack of symmetry will not appreciably affect the 
chances. Similarly if five cards are dealt from a well-shuffled deck, we feel that 
(unless the dealer is crooked) any specified set of five cards is about as likely as 
any other. In these cases it is a comparatively simple matter to calculate the 
probabilities of events that may be of interest —the probability for instance that 
a 6 turns up on a die, or that the five cards dealt from a deck of 52 are all of one 
suit. Trouble arises when we cannot assume the fundamental symmetry—how 
can we assess the probability of 6 with a loaded die? 

There are many writers who feel that the only kind of probability definition 
that makes much sense, particularly in statistics, is based on the idea of the 
relative frequency with which events of interest happen in a long series of similar 
trials. To assess the probability of 6 with a die (loaded or not) we roll it a large 
number of times and count the number of times 6 turns up. The ratio of this 
number to the total number of rolls is the relative frequency of 6 and is an 
approximation to the true probability of 6. Unfortunately we cannot simply say 
that the probability is the limiting value of this ratio as the number, п, of trials 
increases, since the ratio does not tend to a limit in the strict mathematical sense. 
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What we can say is that, if some conditions are satisfied, then in the long run it 
becomes almost certain that the difference between the observed relative fre- 
quency and a fixed limit will be less than any number we like to name. The 
smaller this number, of course, the greater the number of trials we shall have to 
make to reach a state of “almost certainty." The conditions to be satisfied are, 
first, that the successive trials are independent (which means that the result of 
any trial is not influenced by what has happened on previous trials) and, second, 
that the context of the trials has remained essentially unaltered. This means that 
all the circumstances surrounding the trials are either unchanged during the 
whole set of observations or, if they are changed, have no appreciable effect on the 
trials. The decision as to what circumstances are or are not relevant is one 
which is often needed in experimental work, and must be made on the basis of 
experience. 

The above statement is a special form of the “law of large numbers," which 
will be stated more precisely later on. As the number of trials increases, the 
relative frequency of the event in question converges stochastically (or, as it is 
sometimes expressed, “converges in probability”) to a limiting value, which is 
defined as the probability of the event. For a detailed, semipopular discussion 


of this concept, see [6]. 


13 Events We shall take the point of view that probability relates to 
events, which are phenomena that may be observed either to happen or not to 
happen in a particular context. Thus if two dice are rolled repeatedly, one event 
in which we may be interested is a total of seven spots turning up. This event has 
a definite probability within the context of rolls of these particular dice. The 
context includes not only the rolls that have actually occurred, but all those that 
might conceivably occur if we had unlimited time and patience to go on rolling 
the dice. Another type of event is a measurement, for example, of an intelligence 
quotient for a 10-year-old Negro boy. The observed value, say 114, to the 
nearest whole number, is one that has a probability significance within the con- 
text of all such measurements on boys in the United States or only those on 
10-year-old boys, or perhaps only those on all 10-year-old Negro boys. The 
particular context depends on what probability we are interested in. 

Events may be classified as simple or compound. A compound event is one 
that can be decomposed into a set of simple events, whereas a simple event 
cannot be decomposed any further. The occurrence of 6 in a throw of a die is a 
simple event. The occurrence of 7 with two dice is a compound event, because it 
can be split up into six simple events, each of which corresponds to the same 
compound event, namely, 6 and 1, 5 and 2, 4 and 3, 3 and 4, 2 and 5, 1 and 6. 
Here the first number in each pair represents the number of spots shown by the 
first die and the second number that shown by the second die. 

A simple event may be represented by a point in a suitable “space.” The 
space corresponding to the number of spots on the upper face of a die consists of 
six isolated points, numbered 1, 2, 3, 4, 5, 6, on the axis of real numbers. Any 
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observed simple event, such as a 6, is represented by one of these points. The 
compound event, “а throw of at least 4," is represented by the three points 
labelled 4, 5, and 6. The space representing all the possibilities in a particular 
situation is often denoted by %, meaning the universal set of all possibilities. 
The space, 4/, corresponding to the possible outcomes with two dice, is a 
square lattice of 36 points, as in Figure 1. The x-coordinate represents the first 
die and the y-coordinate the second die. The compound event, “(ће total of spots 
Shown is 7," is represented by the set of six points which are ringed in the figure. 


100 - 
y 
5 5 
2 e. 
Е 
15| 
0 ; 
15 Husband's Age 100 x 
Fic. 1 SIMPLE AND COMPOUND Fic. 2 COMPOUND EVENT IN CONTINUOUS 
EVENTS, WITH TWO DICE SPACE 


Ап insurance company may be interested in the age distribution of married 
couples. If x years represents the husband's age and y years the wife's age, the 
Space 7/ for the event, “the husband is x years old and the wife is y years old," 
is a region of the x-y plane between limits (say 15 and 100) for both variables. 
The compound event, “the husband is older than the wife," will be represented 
by the shaded area in Figure 2, below the line x — y. 


1.4 Elements of Set Theory Applied to Events We shall denote events by 
letters, A, B, C..., ог sometimes by A,, 45, A3.... These events will be 
Tepresented in diagrams by regions (or points) of the appropriate space. For 
convenience the diagrams will be drawn as though the space were continuous, 
although it may often consist in reality of a set of discrete points (as in the 
example of the two dice, Figure 1). 

The basic idea of using diagrams like those in Figures 3 and 4 seems to be due 
to the 18th-century Swiss mathematician Euler. Refinements were made by the 
British logician John Venn (1834-1883), and such diagrams are now usually 
called Venn diagrams. 

The contrary event to 4 (1.е., the event “А does not happen”) will be repre- 
sented by A, which may be read as “‘4-tilde,” or “not-A.” The events 4 and A 
together make up the whole of the appropriate space %. Hence Л is often called 
the complement of A (see Figure 3a). 
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If А and B are events within the same space Ф, A is said to imply В (or, 
symbolically 4 c B) if whenever 4 occurs B necessarily occurs. The region 
corresponding to А is nowhere outside the region corresponding to В (see Figure 
3b). For example, the event» “the number of spots shown by a die is 4," implies 
the event, “the number of spots showniseven." If 4 c Band B c A, the events 
A and B are equivalent (symbolically, 4 — B). 

The event, “both A and В” (i.e., 
the simultaneous occurrence of both 
events), is called the intersection of A 


А 
and В and is represented by 4 п B. 
In a Venn diagram it is represented by 


the intersection of the areas corres- Pree, 
ponding to А and В (Figure 3c). Ы 
The event, “А and/or В” (i.e., the (a) U 


occurrence of at least one of the events | д 

Aand B), is called the union of Aand B, 

and is denoted by A о B. It is repre- (UA 

sented by the wholearea that is included 

in either the А area ог the B area АПВ AUB 

(Figure 3d). If the events A and B are (c) (9) 

such that they cannot both happen at U 

the same time, they are said to be dis- 

joint, or mutually exclusive. In this case 

A U B is represented by the sum of the 2 

areas of A and В (Figure 3e), and the А+В 

union is then often denoted by А + B. (9 

The symbol “+” here denotes a logical 

sum and means "either . . - or". It is Fic. 3 VENN DIAGRAMS 

not the plus sign of arithmetic. 
The above notation is read 

intersection of the events Ay, А2. 


Пу extended to а finite number of events. The 
.. A, is denoted by 


А, с\ А; 0 Аз... С\ Аһ OF NA 


while their union is denoted by A, U A2... У А„ OF Ш, Ар. If the events 


are disjoint, the union is often denoted by Уж=1 Ак 
The event А + Я, or Ф, may be interpreted as a sure event (one that is 


bound to happen). It is represented by the whole of the appropriate space. The 
event А n A, or & may be interpreted as an impossible event. The set of points 


representing @ is said to be null, or empty. 


n and Intersection of Events The following 


1.5 Some Theorems on Unio 
from the definitions of the preceding section. 


theorems may readily be proved 


э 
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They are most easily appreciated by reference to the corresponding Venn 
diagrams (Figures 3 and 4). 

THEOREM 1.1 If A c B, then B с А. 

THEOREM 1.2 40 В = Be Aand A o B — Bo A. In algebraic language, 
both operations are commutative. 


THEOREM 1.3 (ANB) л С = Ап (Вп С) = Аг Вг С, 
and (Ао В) о С= Ао (Вос) = Ао Во С. 
Both operations are associative. 


THEOREM 1.4 An (Bu C) (An В) u(AnC). 
Ач (Вос) = (Ао В) п (Ао С). 


See Figure 4, a and b. The two operations are distributive with respect to one 
another. 


An(BuC)= Au(BnC)- 
(Ап B)u(AnC) (Au B)n(AvC) 
(a) (b) 


FIG. 4 VENN DIAGRAMS FOR THREE EVENTS 


THEOREM 1.5 (AB) = A o B. 


(AGB) = Ао В. 


It will be observed that in Theorems 1.2 to 1.5 there is a perfect duality 
between union and intersection. If in any theorem the symbols of union and 
intersection are interchanged, another true theorem results. 

For further elementary discussion of sets, see [7]. 


1.6 Probability as a Measure An event, as we have seen, corresponds to a 
subset of the universal set 4/ of all possibilities in the particular situation. 
Suppose we can in some reasonable way assign a weight (а non-negative number) 
to each point or element of area of 4, so that the total of these weights is 1. 
Then the weight assigned to an event А will be the sum of the weights of all the 
points (or elements of area) which make up A. This is called the measure of А. 
We then define the probability of A as equal to its measure. 

It may be noted that the concept of measure has a much wider meaning than 
this in modern mathematics. Probability measure is only one among many 
types of measure. 


Й она АННА 
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The assignment of weights in an actual problem will depend upon the 
information available and on an analysis of all the possibilities. Thus if we have 
no reason to doubt the accuracy of a given die and the honesty with which it is 
rolled, it seems reasonable to assign the same weight to each of the six logically 
possible outcomes. In other words we take the weights as each equal to 1, since 
they must add up to 1. The measure (and therefore the probability) of the event, 
"the number of spots is even," is 2, since this event includes the three simple 
events, 2, 4 and 6. 

Suppose that there are three horses—A, B, C—in a race, and that, on form, 
I judge that A is twice as likely to win as either В or C, but that the chances of 
B and C are about equal. I would then assign weights 4, 1, 1 to А, B, C, 
respectively. The probability of the event, "either 4 or B wins," would 
Беу + = i. The probability of the event, “either B or C wins," would be 


itii 
The techniques of calculating probabilities, once the weights have been 
assigned, will occupy the major part of this chapter. 


1.7 Properties of a Probability Measure The basic properties of the 
Probability measure P(A) of an event A are 
(1.7.1) P(A)=1 if А=%. 
This means that some event іп ~% is bound to happen. 
(1.7.2) 0 < P(A) < 1 forevery A in 4. 
(1.7.3) P(A о B) = P(A) + P(B) — P(A е В). 


The first two follow immediately from the definition. The third may be 
appreciated by reference to Figure 3, c and d. In reckoning P(A) ва P(B), the 
Weight of every element of the intersection is counted twice. The sum is therefore 
greater than P(A о В) by the measure of this intersection. 

If А and B are disjoint, 


(1.7.4) P(A + В) = P(A) + P(B). 


This is called the addition law for probabilities. 
= %, it follows from equations (1) 


Since A and A are disjoint, and A + 4 = 
and (4) that 
(1.7.5) P(A) + P(A) = 1. 

This rule is often useful when it happens to be easier to calculate P(A) than 


P(A). We can find P(A) by calculating 1—P(A). 
Since the impossible event & is the complement of the sure event, we see that 


(1.7.6) P(6) = 1 — P(Y) =0. 
The probability of an impossible event is zero. 


о 
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It should be understood that because P(A) = 0 it does not necessarily follow 
that A is impossible. If the set 4/ is represented by the points inside a square of 
unit side, it would be reasonable in some contexts to make the measure of any 
sub-region of ihe square equal to the area 
of that region. The measure of any set of 
isolated points or finite line segments would 
then be zero, although we could not say 
that these points or lines correspond to 
impossible events. If a property holds with- 
in a given region, except for a set of points 
of measure zero, it is said to hold a/most 
everywhere. The statement, therefore, that 
P(A) = 0 does not imply that A is im- 

Fic. 5 UNION OF SEVERAL EVENTS possible. It does imply that the measure 

of the points corresponding to 4 is zero. 

In the same way, a probability of 1 does not necessarily mean that the event in 

question is absolutely sure, but only that the exceptional points in the appro- 
priate space have a measure zero. 


THEOREM 1.6 Jf A c B, then P(A) < P(B). 
To prove this, let C be the part of В not included in A; then B = C + A. 
By equation (4) above, 


P(B) = P(C) + Р(А) > P(A), 
since P(C) > 0 by (2). 


THEOREM 1.7 For n events A,(k = 1,2... n) 


P(A) < Y PA) 
k=1 k=1 
It is readily seen from a Venn diagram (see Figure 5) that 


UA TA +2, с А, +4, Aro Ay 


+...44,04,...0 Aun An 


LI 


These disjoint sets (for the case n = 4) are shaded differently in the figure. But 
Ain А. < Àj Ап А, nd, c Аз, etc.; therefore, by Theorem 1.6, 
P(A, A А) € Р(А,), etc. It follows that 


P(A.) < P(A,) + P(4) +... + PG) = È PAD. 


| This theorem may be extended to incluge infinite sets of events. The equality 
sign holds for disjoint events. 


| 
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THEOREM 1.8 The extension of Eq. (3) to three events gives 


(1.7.7) P(AU BUC) = Р(А) + P(B) + P(C) – Р(А n B) 
Р —P(AnC)- P(BnC) + Р(А ту Bn C) 


This is easily proved by writing D for the event Bu C and applying Eq. (3). 
Note that, by Theorem 1.4, A ^ D = (AN В) о (An C). 
The result may be further extended to п events as follows: 


(1.7.8) ШӨ =Y P(A) - РА; 9 А) 
1 ЈЕ 


Je 


+" Aj A^ А)—... 


ju 
«cien а). 


where X x means that the sum is over all jand k with j z k, Y" n means that the 
sum is over all j, А, / with no two of these equal, and so on. 


1.8 Interpretation of Probability in Terms of Betting Odds A recent book by 
L. J. Savage [2] emphasizes the personal aspect of probabilities. Savage argues 
that the rational man acts as if there exists for him, corresponding to each 
situation in which he has to make a decision, a set of probabilities and a set of 
utilities. The probabilities relate to the various states or aspects of the world 
ant to the particular situation). The utilities 
will accrue to him from each particular 
decision for each particular unknown state of the world. He acts in such a way 
as to make the utility he expects to get as great as possible. 

The probabilities, being а personal matter, can be assessed by presenting the 
man with a suitable bet. If there are just two relevant states of the world, s, and 
52, with probabilities p, and p2» and if the man is offered betting odds of C to 1 
against 51, he will take the bet, provided p/p, < C. By varying C until the bet is 
accepted, the values of p, and р; can be determined (remembering that 
Pı + p, = 1). When the English physicist, Sir John Cockcroft, said recently, 
Speaking about the British atomic reactor, Zeta, "I am 90% certain that the 
neutrons were produced by à thermonuclear reaction," he meant presumably 
that he would be willing to bet as high as 9 to 1 that the reaction was in fact 
thermonuclear. When a man bets on a horse which is quoted at odds of 6 to 1 
against, it may be concluded that he regards the probability of its winning as at 


that may be supposed to exist (relev 
measure in some way the values that 


6 tag » 
least 1, (ir DE > рз i = 6.) At least the “rational man” would so act. 
` 


1.9 Interpretation of Probability in Terms of Relative Frequency If an 
bility p, in a particular context, one interpretation 


event A is assigned the proba ii A 
of this probability is that in a long series of n similar trials in this context, the 


^ 
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event would happen in approximately пр trials and fail to happen in the re- 
mainder. As already mentioned, this is a special case of the law of large numbers. 
Where there is no other natural or reasonable method of assigning probabilities, 
the experimental method of counting the number of times (r) that the event 
happens, and using the approximation r/n for p, is usually applicable. Insurance 
companies estimate the chance that an insured person will survive to a given 
age by a study of the records of such persons in the past. Actually, the context 
has not, in this particular example, remained quite constant. Improvements in 
public health, and new drugs, have in recent years greatly increased the chances 
of survival of individuals, so that the probabilities assessed from the records of, 
say, the past 50 years tend to be too low. However, there is no way, except from 
the records, to assess these probabilities. The companies should, of course, 
use the most recent records that are available. 

If, as the outcome of every trial, we can 
say that a particular event А has or has not 
happened, and also at the same time that 
another event В has ог has not happened, 
there are clearly four possibilities altogether 
as regards the two events jointly, namely, 
the events denoted by А ^ B, A ^ B, Ап B, 
and А ^ B. These are mutually exclusive, 
or disjoint. 

If the corresponding frequencies for these compound events are a, b, c, and 
d, (the sum of all these being л), the relative frequencies are а/п, b/n, etc. The 
relative frequency of A is (a + b)/n = r,/n, where гү is the total frequency in the 
first row of the two-by-two table (Figure 6), since both events А ^ Band А ^ B 
imply that 4 happens. If we denote the relative frequency of А by f (A), then 


Fic. 6 Two-Bv-TWO FREQUENCY 
TABLE 


(1.9.1) f(A) =f(A ^ B) --f(A o B). 


The event A o В includes all cases in which either А or В (or both) 
happen, that is, it includes А п B, Ап B and А п B, but not Aq B. The total 
frequency for А o Bis thereforea + b + c = гү + c, — a, where ri is the total 
frequenoy in the first row and c, is the total frequency in the first column. Since 
f (B) = c/n, we have the rule 


(1.9.2) Аз B) = f(A) + f(B) — f(A с B) 

If we regard the relative frequencies (for large n) as approximations to the 
corresponding probabilities, we arrive at the basic law for probabilities given 
in (1.7.3). 


1.10 Conditional Probability In the table of Figure 6, the event А happens 
in r, cases altogether, and in a of these cases the event В also happens. We can 


e 
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therefore state that the relative frequency of B, when it is known that A happens, 
is given by 

(1.10.1) S (BIA) = ajr, = ат + туп =f(B o АСА), 


where it is assumed, of course, that r, is not zero. 
This rule also can be assumed to apply to probabilities, and we can in fact 
define the conditional probability of B, given that A happens, by 


P(B ^ A) 
(1.10.2 Dates Pest 
) P(B|A) PU) (A) = 
In the same way, 
Р(Ас В 
(1.10.3) пав = 507, Р(В) #0 


The two events А and В are said to be independent if 
(1.10.4) P(A ^ B) = P(A): P(B) 


If this is so, then from (2) and (3) P(B|A) = P(B) апа P(4|B) = P(A). This 
means that the probability of 4 or of B does not depend at all upon whether 
the other happens, in agreement with the intuitive idea of independence. 
Equation (4) is often called the multiplication law for probabilities. 

With more than two events the situation becomes rather complicated. Three 
events А, B, C are independent if each pair (AB, AC and BC) are independent and 
if also 


(1.10.5) P(A ^ Bo С) = P(A): P(B): P(C). 


This implies that four probability conditions have to hold. These may be 
P(A|B) = P(A), P(A|C) = P(A), P(BIC) = P(B) and. P(C|A o B) = P(C). 
There are also five other conditions (given by interchanging the letters A, B, 
©) which hold when the first four hold. 

The general relationship which replaces (5) when A, В, С are not independent 
may be written 
(1.10.6) P(A A Bo С) = PCA): P(B|A): P(C|A ^ B), P(A) # 0, P(A ^ B) #0 


The fact that three events may be pairwise independent without being 
Completely independent may be illustrated by the following example [8]. Imagine 
four similar discs in a bowl, numbered respectively 112, 121, 211, 222, and 
Suppose one disc is picked at random. Let the events 4, В, С Бе “the first digit 
On the disc picked is 1,” “the second digit on this disc is 1," and "the third digit 
On this disc is 1." Then it is easy to see that P(A) = P(B) = P(C) = 3, P(A п B) 
=Р(А ^n С) = Р(В с\ С) = 4, but P(AN BOC) = 0. The three pairs AB, 
AC, and BC are therefore independent but the condition (5) is not satisfied, so 
that 4, В, C are not all three independent. 
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1.11 Elements of Combinatorial Analysis Most ordinary calculations in 
elementary probability are based on the assumption that every simple element 
of the finite set Y of all possible outcomes has the same weight. The probability 
measure of an event A is therefore proportional to the number of elements of 4 
included in A. Thus to find the probability of the event, “а total of 11 spots with 
two dice," we assume that all of the 36 ordered pairs of numbers which can 
represent the fall of the two dice have an equal weight. Two of these, namely 
(6, 5) and (5, 6), correspond to a sum of 11, and the probability of this event, on 
the basis of our assumption, is therefore 2/36. In more complicated situations 
the calculation of the number of elements included in a particular subset 
of 4 will often involve the mathematics of permutations and combinations. 
We therefore recall briefly a few definitions and theorems, without giving proofs 
of the latter. 


THEOREM 1.9 The number of ordered arrangements (permutations) of n 
distinguishable objects is 1:2: 3... "п. This number is denoted by n! (read 
"factorial я”). 


THEOREM 1.10 The number of ways of selecting and arranging in order r out 
of n distinguishable objects is п!|(п — r)!. This is often denoted by (п),, a notation 
due to Feller [9]. When ^ = n, the result should reduce to л!, so that we must 
agree to define 0! as 1. 


THEOREM 1.11 The number of ways of arranging in order n, objects all alike 
of one kind, n; all alike of a second kind, and so on, up to k kinds of objects, is 
nl[(ny! n! . . . nyl), where En; = n. 


THEOREM 1.12 The number of ways of picking r out of п distinguishable 
objects, regardless of the order in which they are arranged, is called the number of 
Em --— = (n),[r!. The symbol (" ) may be 
r!(n — в)! 
read “n above r.” Other notations such as C(n, г) or "C, are also met with, but 
the ons used here seems to be increasingly common. From the definition it 


follows immediately that (0) = (") = = |. The symbol (" ') for r > n is defined 


В а P " n 
combinations. It is given by (") = 


аз 0. 
Since (^) = n(n — 1)(л — 2)...(п = r + 1)/r!, the notation сап be 
extended by writing 


| -(-n(-n-1Y-n-2)...(-n— r4 Dr! 


=(—1)'п(п + 1)(п + 2)... (п + – Dr! 
za gps +r— ). 
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THEOREM 1.13 The binomial theorem for a positive integral index may be 


written "i 


xx" t 


| 
+ 
s 


(1+5) = 


І 
7 
ims 
1 
=. = 

"ES. 

z 


€ n ; 
since all the coefficients (") vanish when г > п. 


TugoREM 1.14 The binomial theorem for a negative integral index may be 
written 


(1+ х)" =Т- их dem 


ll 
Ms 
~ 
1 
= 
% 
элк: 
+ 
ч ч 
1 
= 
aS 
р” 
s 


The two theorems 1.13 and 1.14 are therefore formally identical with the 
ipi ion however corresponds to a finite 


Substitution of —7 for п. The first expansi¢ З 
number of terms and the second to an infinite series. 


1.12 Sampling from a Finite Population The process of picking a set of r 
Objects out of a given set of п objects is often called sampling. The n objects 
Constitute a “population,” or “universe,” and the r objects constitute the sample. 
If every object in the population has an equal chance to be esr fora UM 
Sample, this sample is said to be random. Other kinds of samp ing are of course 
Possible, For instance, we could arrange the population in some order and pick 
every tenth object. This would bea systematic sample. In most problems of 
Statistical inference, where it is required to infer properties of a population from 
those of a sample, it is understood that the sampling is random. = for 
Convenience or even for increased accuracy, some special scheme of sampling 
May be adopted, but if we are to make valid inferences from = ш = the 
Population there must be an element of randomness a - s oice of the 
Sample. Sampling procedures are discussed more fully a nic A ids 

There are two ways in which we can choose our random ple. may 

d put it back before picking the next object. 


Pick an object, make a note of it, an dna 
This is eslled "sampling with replacements," and of course implies мае ѕате 
Obiect can appear more than once in the same sample. In fact, since there are n 


Possible choices for each item in the sample, the number of ways of picking the 
, 
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sample (taking the order into account) is п”. On the other hand, when an 
object is picked for the sample it may be put on one side and removed from the 
population. It is not then available to be picked again in the same sample. This 
is the situation envisaged in Theorems 1.10 and 1:12, and is called “sampling 
without replacements." The number of ways of picking the sample (taking order 
into account) is now (n), = n(n — 1)...(n — r + 1). 


EXAMPLE] Тһе number of possible five-digit numbers (including those 
beginning with one or more zeros) is 10°. The number with all five digits different 
is (10); = 30,240. The probability that a five-digit number selected at random 
will have all its digits different is therefore 0.3024. 


EXAMPLE 2 The probability that in a class of 25 students no two will have 
the same birthday is, by a similar argument, (365), -/(365)?5. On the assumption 
that all days in the year are equally likely as birthdays (and ignoring leap years), 
the number of possible arrangements of birthdays among the 25 people is (365)?? 
and the number of arrangements with all birthdays different is (365),;. The 
probability may be written as 


p= (oso) (1-3) 


As a rough approximation, writing log,(1 — x) x —х, we have 
l-d424.24 
log.p x ——————————— = —0.823 
Бер 365 


giving р = 0.44. The exact result, which may be evaluated by means of a table of 
logarithms of factorials (e.g., Glover's Tables [10] or Biometrika Tables [11]), 
is 0.4315. It is rather surprising to most people that the chance of at least two 
coincident birthdays in a group of this size should be as high as it is, namely 
0.5685 (2 1 — p). 


EXAMPLE3 The probability of holding precisely three aces in a hand at 
bridge is the ratio of the number of possible hands containing three aces to the 
number of hands altogether. The basic assumption is that every completely 
specified hand of 13 cards that can be dealt from a deck of 52 cards is just as 
likely as every other, which is probably reasonable if the deck is more thoroughly 
shuffled than is customary in actual play. 


The total number of possible hands is (ale which is a very large number, 
about 635 billions. The number with three aces is given by multiplying the 
number of ways of picking the three aces, namely (le by the number of ways 
of picking the 10 other cards in the hand out of the 48 cards in the deck which 
are not aces, namely p.t The required probability is therefore (3) к. "Een 
which reduces to 0.0412. 
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EXAMPLE 4 Why does it pay, in the long run, to bet even money on seeing 
6 at least once in four rolls of a die, but not to bet even money on seeing double-6 
at least once in 24 rolls with two dice? 

This problem was solved early in the history of probability theory. It was 
posed by the Chevalier de Меге, а courtier and amateur mathematician at the 
French court around 1650, to the celebrated mathematician Blaise Pascal. Since 
the chance of 6 in a single throw with one die is six times as great as the chance of 
double-6 with two dice, it seemed only natural to him that the number of throws 
g the second event should be just six times 


of seeing the first event. However, 
d Pascal was able to show him that 


necessary for an even chance of seein 
the number necessary for the same chance 
calculations did not seem to bear this out, an 


his supposition was in fact not true. 

It is easier to find the probability of the complementary event, no 6 at all, 
than that of at least one 6. The probability of not seeing 6 in a single roll is 5/6, 
and if the rolls are independent the probability of not-6 on four successive rolls 


4 5\4 
is (5) . The probability of at least one 6 is therefore 1 — B) = 0.516, which 


is a better than even chance. . | : 
Similarly, the chance of not seeing double-6 in all 24 successive rolls with two 


dice is (35/36)?* and the chance of at least one double-6 is therefore 1 — (35/36)?* 
— 0.491. This is a less than even chance. The two chances are so nearly equal, 
however, that it seems unlikely that the difference could be detected empirically 


in ordinary play. 
an ordinary deck, looked at, and replaced, 


ExAMPLE 5 А card is drawn from е 1 
hould this be done in order to have а 


and the deck is shuffled. How many times $ 


/ g 7 
90% chance of seeing the ace of spades at least once · | : 
The same argument as in Example 4 leads to the conclusion that the chance of 


Not seeing the ace of spades in 7 successive draws is (51/52)". If this is put equal 

to 0.1, we have an equation for л. 
Inverting both sides and taking commo 

log 51) = log 10 = 1 


n logs, we obtain 
n(log 52 — 


= 119. 


$0 that n = 


1 
0.0084 | 
In problems such as this, 7! must necessarily be an integer, so that the 
Probability cannot always be adjusted exactly to a pre-assigned value. By taking 
the next highest integral value of n, we ensure that the probability will be at least 
equal to the value given. 

1.13 The Indicator Function The idea of a function plays an important role 
in probability theory. A function is a rule which takes us from one set (called the 
domain of the function) to another set (called the range). To each element a! 
the domain the function assigns one element of the range. Thus the function x 
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may have as domain the whole set of real numbers, positive and negative and 
zero, and if so its range is the set of positive real numbers and zero. To every 
value of x there is just one value of x?. The function is said to be defined on the 
domain. м 

In probability theory, a function may be defined on the set of points such as 
а; which belong to the universal set 4 of all possibilities (within the given 
context). If its range consists of the set К = (гү, м5... г) and if the function 
assigns to each a; the value r;, we may write the functional relation as f (а;) = rj. 

It often happens that the function assigns to several elements of 4 the same 
element гу. If so, we can define a probability measure on the space A by giving to 
each element r; the sum of the weights of all the points a; which are such that 
/ (aj) = rj. If A denotes the set of all these points а» then P(A) is the weight 
attached to гу, where’P is the original probability measure on %. The measure 
defined in this way on R is called the measure induced by the function f. 

A particularly simple and useful example of a function is the indicator function 
I4, defined on the whole space 7. This has just two values in its range, 1 when 
the point a; belongs to A (a subset of 4/) and 0 when it belongs to A. The indicator 
function corresponding to an event А may be thought of as 1 whenever А happens 
and 0 when it does not happen. 

The indicator function of the event А A B is given by 


(1.13.1) Ians =la: Ig 


since J, and /, are both not zero only for points lying in the intersection of A 
and B. Similarly for the union of A and B, 


(1.13.2) Тлов = Ід + 1в — Vane 


as is easily verified by checking the values of the right hand side for the different 
regions making up 4 U В in a Venn diagram. 

If the whole space @ is partitioned into a set of disjoint events Ay, 45, . . . 4 
and if a function X is defined on all points of 4/ by the relation X(A;) = xj 
where хү, x4... x, are real numbers, then X is called a simple random variable, 
or a variate. From the definition of the indicator function, it follows that 


(1.13.3) Же У, 
j=1 


Thus for two dice the space of possibilities consists of 36 points (Figure 1). 
If the variate X is the sum of spots shown by the two dice, x; may take any one 
of the eleven values 2, 3, 4... 12. The set A, consists of the single point (1, 1). 
The set A, consists of two points (1, 2) and (2, 1), and so on. A variate may thus 
be thought of as a mapping from the space of possibilities 4 (the domain) into 
the axis of real numbers (the range). All the elements of A, are mapped into the 
point x,, all the elements of 4, into the point х», and so on. See Figure 7. We 
shall for the most part adhere to the convention of representing a variate by а 
capital letter such as X and the numerical values in its range by small letters such 
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аз x. We can then, for instance, speak of the probability that X takes the 
value x, or the set of values between x, and x2. 

This distinction between a variate ¥ and a numerical value x (which the 
variate may take) is one wifich the student should try to get clear in his mind. 
The variate is a function on the space of events fo the real axis. It associates with 
each possible event А; a real number Xj, which may be the obvious number 
connected with the event (as in the example of the sum of spots shown by two 


E." 
Ban 


—-X 
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umber (as when we denote a male birth by 1 
in of X is the set of all possible events A ; in the 
he range of X is the corresponding set of real 
lude only a discrete set of numbers or 


dice) or a more or less arbitrary n 
and a female birth by 0). The doma 
Particular context considered, and t 


numbers, The range, of course, may incl 
may include all real numbers in a finite interval or even the whole of the real 


axis. The domain is often called the “sample space” or the “possibility space.” 
А point in this space (or a set of points) is a possible event of the type considered. 


1.14 Expectation If P(4j) is the probability of the event A,, the expectation 


Of the variate Y is 
(1.14.1) B(x) = ДАРИ 
E 


old in a lottery and if there is one prize of 


Е ickets are $ 6 
канра 6 2210000 Бони the expectation of the worth of a single 


$1000 and ten prizes of $50, what is 
ticket? 


If Y is the worth of a ticket in dollars it is a variate which takes three values, 


namely, 1000, 50 and 0. The probabilities corresponding to these, on the 
assumption that the winning tickets аг d by a purely er process, are 
1/10,000, 1 /1000 and 9989/10,000 respectively. The expectation of X is therefore 
$(1000/10 000 + 50/1000), or 15 cents. If the price of the ticket were 15 cents 


e picke 


18 INTRODUCTION TO STATISTICAL INFERENCE 1.14 


the lottery would be "fair," in the sense that the price would be equal to the 
expectation. Actual lotteries and gambling games are not fair in this sense, since 
a substantial percentage of the money raised goes to the organizers (or the 
“БапК”) and is not available for prizes. i 


EXAMPLE 7 Johnny is collecting a set of 12 kinds of coupon, one coupon 
being found in each packet of a particular breakfast cereal. If the family buys 
a new packet on Monday of each week, how long should he expect to have to 
wait before the set is complete? It is assumed that the different kinds of coupon 
are distributed at random in packets of the cereal. 

Suppose that on a particular Monday, after opening the new packet, Johnny 
has collected x different kinds of coupon (1 < x < 11). The chance that the 
packet to be opened in one week's time contains one of the kinds he already has 
is x/12, and the chance that it has a new kind is 1 — x/12. The chance that he 
has to wait two weeks before getting a new kind is x/12(1 — x/12). The chance 
that he has to wait г weeks and then gets a new kind is (x/12) ^! (1 — x/12). 
Denoting this probability by p(r, x), we obtain as the expectation of r for a given 
x the expression 


© 


Emos hia etn) 


r= \ P 


s» x хү? 12 
= Syrie sss em 
( 12 12 12 T 12-х 


The total expected time before the set is complete is therefore 


5 E -»x(L ee +1) 36.2 k 
bx TE = 36.2 weeks 
THEOREM 1.15 If 1, is the indicator function for the set, A, E(I4) = P(A). 
By the definition of 7, it takes only the two values, 1 for A and 0 for A. Its 
expectation is therefore 1-P(A) + 0-P(A). 


THEOREM 1.16 If X and Y are variates defined over the same space Ф of 
possibilities, E( X + Y) = E( X) + E( Y). 

If the space 4/ is subdivided into disjoint sets АХ) = 1,2...n)andalso into 
disjoint sets B,(k = 1, 2... m), then, by Eq. (1), 


Е(Х) = У х,Р(А),  EY)- 3 укР(В,) 


1 


Now the event А; may be separated into disjoint sets A; à B,, Aj В»... 
А; A B, (see Figure 8), so that A; = У, A; O В,. Similarly, В, = У, А; O Be 
Therefore, 


воо) + EO) = (у dir EY D „(у аж в) 
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By the addition law for disjoint events, 
PE Ayn 5.) =} P(A, B), and P(X Аус в) =F P(A; B) 
k k 9 F] j 


Hence, 
E(X) + EY) 2 У X (ху + ЮРА; п Bg. 
k 


J 
By definition, EY + Y) = У (х; + УРА; ^ B,) since for all points in the 
Intersection of A; and В,, the variable Y + Y has the value x; + уу. 

The result can be extended to any finite number of variates. 

1.15 Independent Variates We have defined the concept of independence 
for events, in $ 1.10. Classes of events <, 
B,C... are independent if Aj, By Cp. 
àre independent, where А; is any member 
of Z, B, any member of 58, and so on. 
Random variables Y, Y, Z . . . are indepen- 
dent if the partitions of 4/ on which they 
аге defined are independent, that is, if 
А М Xx, fas Fe Y July, ete, where each Fic. 8 DISJOINT SUBSETS OF A SET A 

j 1$ independent of each B,, etc. 


THEOREM 1.17 If X and Y are independent, 
E(XY) = Е(Х)`Е(Ү) 


By definition, E(X Y) = У; У, xP; С В,). Ѕіпсе 4; and B, 
Р(А A В,) = P(A,)P(B,). Therefore, 


EXY)-YX xy, P(A)POQ 
T № 


-" b xA " р А) 


= E(X): E(Y)- 


This result also can be extended to.any finite number of variates. 


are independent, 


me problems we cannot divide the whole 
r even à denumerable) set of regions А; 
e xj. What happens is that X varies 
is a probability f (x) dx that it takes а 
h a case f (x) is called a probability density, 
-negative function, integrable over 
x=atox= b), and such that 


1.16 Continuous Probability In so 
Space of possibilities 4 into a finite (o 
™ each of which Х takes a definite valu 
Continuously over some interval and there 
Value between x and x + dx. In suc 
9r simply a density. It is a single-valued, non 
Whole domain of definition (say from 


af (x) dx =1, 
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It is customary to take the domain of x as the whole real axis from — oo to 
+ оо, and to put f(x) = 0 over any intervals which correspond to impossible 
values of X. It may happen, for instance, that X is by nature non-negative, in 
which case f(x) = 0 for all negative values of x. 

If the event А corresponds to a value of x between « and р, the probability of 
А is defined by 


B 
P(A) =Í f(x) dx 


Note that here the function f (x), defined on the axis of real numbers, induces а 
measure оп 4, the measure of the interval dx being f (x) dx. The probability that 
X < x is given by 


(1.16.1) F(x) -l f(u) du 


This function, F(x), is called the cumulative distribution function (often simply the 
distribution function) corresponding to the density f(x). F(x) is a non-negative, 
never-decreasing function, with values lying between 0 and 1, inclusive. 

For a discrete distribution, in which Х takes the distinct values x;(j= 1,2, 
3...) with probabilities / (x;), the distribution function is defined as 
(1.16.2) F(x) = Y fe) 

ху<х 

This is а step-function (see Figure 9a). It increases by a finite amount f (xj) at 
each point x;, but remains constant in between. At each point x;, F(x) has the 
value at the top of the riser, and so is continuous on the right but not on the left. 
Figure 9b shows a typical distribution function for a continuous variate. 


(a) (b) 
1 
F(x) F (x) 
QW, Xo XQ X, XQ XQ X. Ola b 
=e —-x 
Еіс. 9 (a) DISCRETE Fic. 9 (b) CONTINUOUS 
DISTRIBUTION FUNCTION DISTRIBUTION FUNCTION 


EXAMPLE 8 (Bertrand’s problem) What is the probability that if a chord is 
drawn at random in a circle of radius a, the length of the chord will be greater 
than a? 

This illustrates a difficulty that often arises in such problems. There is no 
unique answer unless the words “at random" are more carefully defined. 


° 


ө 


s.e 


ba 
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Suppose we pick a point A anywhere on the circle and draw the diameter AOA’ 
through 4. Any chord АВ will make an angle 0 with АА’ which lies between 
—n/2 and z/2. If this angle is between —z/3 and л/3 (so that AB is between AC’, 
and AC), the condition / > a is satisfied. The probability required is therefore 
`л/3 %л/2 5 Er : 

ХО) 40] | f(0) d0, where f(0) is the probability density for 0. On the 
assumption that all possible values of @ are equally likely, /(0) = 1/z, and the 
probability reduces to 2/3. 


Fic. 11 BUFFON'S PROBLEM 


Fic. 10 BERTRAND'S PROBLEM 
P 


awing the chord “at random’ would be 
Don AO. The perpendicular EF to AO 


Passing through D defines a chord of the circle. If OD = x, EF will be greater 

than a whenever x < V3a/2. If the probability density for xis f (x), the required 

Probability is ^УЗа/2 f(x) dxl Г f(x) dx. On the assumption that all possible 
x) dx. Jo- 


Values of x Es. equally likely, f(x) Ma, and the probability reduces to 


Another possible procedure for dr: 
to select 4 as before and pick a point 


3/2 = 0.866. fr induced measures, depending on the 
Th P d to different induc: a x : 
€ two answers correspon It would not be possible to settle 


way we i : t random. 

conceive the chord drawn à у nc 
the question by resort to experiment, because in devising any experimental 
i as tossing a straight piece of wire, 


Set-u зира hord (such : à 
ied Tui. Ме а on to the table-top on which the circle is 
drawn) we would need to choose one particular interpretation of the random 
Process, 

Parallel straight lines, a distance a apart, 
straight piece of wire, or needle, of length 
ble. What is the probability that it crosses 


ЕХАМРЕЕ 9 (Buffon's problem) 
аге ruled on a horizontal table. А 
< а, is tossed at random on to the ta 
а line? А | : 
If we take the x-axis along one of the lines and the y-axis perpendicular, it 

IS easy to see that the x-coordinate of the centre of the needle is ет It 
IS the y-coordinate of the centre and the angle 0 made by the needle with the 
"axis which matter. (Fig. 11). 


"RT. West 
+ West Beng 
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The needle will cross the nearest line if the distance y of its centre from this 
line is less than 4/sin 0. The domain of y is from 0 to a/2 and that of 0 from 0 to 
п. If the joint probability density for y and 0 is f (y, 0), the required probability is 


n (Al sin 0 я a2 
p -Í f f(0, y) dy d0 / | | JO, y) dy d0 
о/о 0.7 о 


The assumption of random tossing means that we regard all possible values 
of 0 and y as equally likely. If so, f (0, y) is constant and the probability becomes 


лр 15іпо "1 [’а/2 
p= || ay || | Чу 49 
о/о оо 


1 [= 
= zl sin 0 40 = 2l[(xa) 
па Jo 


This result suggests an empirical method of approximating to the value of z. 
Various trials of the method have been made from time to time (see, for example, 
reference [12]). 


1.17 Random Numbers The importance of ensuring that a sample is 
effectively random if a valid inference is to be made from the sample to the 
population, has already been mentioned. Various tests of randomness have been 
devised and some will be mentioned later (see $$ 10-13and 10-14). The choosing 
of a random sample, even from an artificial population of cards, balls, discs, or 
the like, is not easy, since the mechanical shuffling or mixing may be far from 
adequate, cards may tend to stick together, balls may not be equally smooth, and 
so on. When it is necessary to pick random samples from a crop in the ground 
or from a group of experimental animals, the task is much.harder. There is a 
natural tendency to pick what seem to be typical rather than truly random 
samples. 

Experience has shown that the best method is to use a set of random numbers. 
These numbers have usually been obtained by some mechanical process, such 
as a very carefully made roulette wheel, and have been thoroughly tested for 
randomness. They are generally arranged in sets of two or four digits and 
grouped in thousands. A short extract from one such table [13] is given in the 
Appendix, Table B1. The largest table up to the present time contains a million 
random digits [14] and was prepared because of the increasing need for very 
large blocks of random numbers in some modern statistical techniques. 

Random numbers may be used to simulate the results of a chance experi- 
ment, such as tossing a coin. Instead of actually tossing a real coin (assumed to 
have a probability for heads equal to 0.5), one may open a table of random 
numbers, jab a forefinger anywhere on the page and start reading random 
numbers (in pairs) from the place so indicated. Whenever the number is 
between 00 and 49 inclusive it is read as “head”, and when it is between 50 and 99 


р 
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inclusive as "tail". Thus the succession of numbers 55, 58, 79, 50, 56, 01, 51, 
65, 92, 32, 21, 66, 35, 18, 65, 08... will be recorded as the following sequence 
of heads and tails: TTTTTHTTTHHTHHTH.... This arrange- 
ment, which the author obtained on his first trial, js а random sequence and 
could easily happen with an actual coin, but it is not what we would write down 
naturally if we tried to forecast the result of such an experiment. Although 
random, it is hardly typical. 

In order to use random numbers to draw samples from a given population it 
is necessary to allot numbers, or blocks of numbers, to elements in the population. 
Thus if we have a population of 600 from which we want to draw a sample of 20, 
we number the members of the population (the data for each individual may be 
entered on a numbered card, for example) and then we read off consecutive 
random numbers in groups of three digits, ignoring all numbers over 600 and 
disregarding repetitions. The numbers so obtained, such as 284, 444, 323, 424, 
358, 521, 406, 565, 457, 078..., represent the individuals selected for the 
ation consists of several classes, the members of any one 
class being practically alike, and we want a sample from this population, we 
simply have to know how many members (or what proportion of members) there 
arein each classin the population, andallota block of random numberstoeachclass. 
The size of the block should be proportional to the number of members in the 
class. Every random number that belongs to a particular block indicates a 
member of that class drawn for the sample. Thus, if there are five classes in the 
population, say 4, B, C, D, E. numbering respectively 80, 200, 450, 240 and 30 
individuals, we can allot blocks of four-digit random numbers as follows: 
A(0000 to 0799), В(0800 to 2799), C(2800 to 7299), D(7300 to 9699), E(9700 to 
9999). The size of each block of numbers (800 for 4, 2000 for B, etc.) is pro- 
portional to the size of the corresponding class in the population. The following 
set of random numbers: 6469, 7152, 0256, 6137, 0458, 0968, 9610, 5778, 8500, 
8981, would indicate a sample C, C, 4, C, A, B, D, C, D, D, consisting of 
two A's, one В, four C's and three D's. 

Random numbers may also be used in many other ways. One common 
requirement in experimental work is to randomize the order ofa group of objects, 
such as plots of land in a block, where the plots are to have different treatments. 
To randomize nine objects, or in other words, to form a random permutation of 
the integers 1 to 9, we can read off any set of consecutive random digits, ignoring 
zeros and repetitions. Thus, the set 347 66455664901 566368 802 
gives the order 347659182. Modifications of this method can be used with 
groups of larger size. The: important thing mn randomization is to use an 1m- 
personal, objective method and not to trust to intuition. 

Although Table B.1 is known to satisfy several tests of randomness, any 
limited collection of random numbers is bound to show some peculiarities. 
Accordingly, the table should not be used over and over again. If very large 
blocks of random numbers are needed, recourse should be had to larger tables 


such as [13] and [14]. 


sample. If the popul 
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PROBLEMS 


А (88 1.7-1.10) 

1. Suppose that of a group of people surveyed, 30% own both a house and a car, 
40% own а саг but not a house, 10% a house but not a car, and 20% own neither. 
Illustrate by a Venn diagram. What is the percentage that own either a house or a car 
or both? What is the fraction of car owners that are also house owners? 

2. Let A stand for the event that a man chosen at random from a certain population 
is overweight and B for the event that he is over 50. Write down the symbols for the 
probabilities that (a) he is not overweight, (b) if he is overweight he is also over 50, 
(c) if over 50 he is not also overweight. 

3. If P(A) = }, P(B) = 1 and P(A U B) = 11/12, what is P(A N B)? Find also P(A|B) 
and Р(В|А). 

4. If A and В are independent events, and if P(A) = } and P(B) = }, what is 
P(AU B)? Hint: Usé Eq. (1.10.4). 

5. Prove from the definition of conditional probability that P(A N Bn C) = P(A): 
P(B|A)-P(C|A П B). 

6. Write out the detailed proof of Eq. (1.7.11). 

7. Two good dice are rolled simultaneously. Let A denote the event “the sum shown 
is 8" and B the event “the two show the same number." Find P(A), P(B), P(A N B), 
P(A U B), P(A|B) and P(B|A). 


B (§ 1.11) 

1. How many five-digit numbers are there with every digit odd? How many with 
no digit lower than 6? 

2. How many arrangements can be made of the letters of the word "caught" if the 
vowels are always together and in the same order? 

3. Four strangers board a bus in which there are six empty seats. In how many 
different ways can they be seated? 

4. Six papers are set in an examination, two of them in mathematics. In how many 
different orders can the papers be given if the two mathematics papers are not to be 
successive ? 

5. Show that the number of ways in which p positive and л negative signs (p > 0, 
0 — n <p + 1) may be placed in a row so that no two negative signs shall be together 


is T H i; Hint: With the positive signs placed in a row, there are p + 1 possible 


places for the first negative sign, p for the second, and so on. The negative signs are 
not distinguishable. 


6. Prove that (") = ( s 


Ду = P 


7. Prove that «) = п" x |). Hence, show that x "() exci. 


d r= 


Sah, о : 
Hint: У ( HA 1 - xí P ) Put x = | in Theorem 1.13. 
r=1 r=0 
8. If (| = (i) what is n? 


9. At a long dinner table the host and hostess sit opposite each other at the ends. 
In how many ways can 2n guests be arranged (п оп a side) so that two particular guests 
do not sit together? Hint: Place these two guests first. 
k 
10. Prove that (УД ish „) = ("t a), К < т. Hint: Use Theorem 1.13 and 
pe 
the identity (1 + x)"*" = (1 + x)"(1 + х)". 
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11. Use Problem 10 to prove that 
E [ke 2k 
(а) 2 d k (2 ) 
m+ 1 тү (om 
ы (1) (0) (6) 
С (8 1.12) 

1. If four cards are drawn at random from a deck of 52 cards, what is the probability 
that there will be one card of each suit? 

2. A bag contains nine white and three black balls. If five balls are drawn, without 
replacement, what is the probability that at least two are black ? 

3. What is the chance that a bridge hand of 13 cards contains both the ace and king 
of spades? 

4. A batch of 1,000 lamps is known to have 5% defectives. If five lamps are chosen 
at random and tested, what are the probabilities that (a) none is defective (b) there are 
exactly two defectives ? 

5. A factory produces screws, put up in boxes of 100. Boxes are inspected by taking 
20 screws at random and rejecting the box if any defects are found in the sample. 
What is the probability of passing a box that contains two defective screws? 

6. Calculate the probability of throwing a 6 with an ordinary die at least once in 
Six trials. 

7. A room has three lamp sockets. From a collection of 10 light bulbs, of which 
only six are good, I select three at random and put them in the sockets. What is the 
probability that 1 shall have light? Hint: Find the probability of nor getting light, i.e., 
of selecting three bad bulbs. 

8. A and B take turns in throwing two dice, the winner being the first to throw 9. 
Show that А has the first throw, their respective chances of winning are in the ratio 9/8. 
Hint: A may winon Ist, 3rd, 5th ... throws, and these possibilities are mutually exclusive. 

9. Cards аге dealt from а well-shuffled deck until an ace appears. Show that the 
probability that exactly т cards will be dealt before the first ace is 4(48)! (51 — n)!/ 
[52!(48 —n)!]. 

10. (The matching problem). A man writes four letters and addresses four envelopes. 
His secretary puts the letters ip the envelopes at random. Show that the probability that 
at least one letter gets into its right envelope is 1 — 1/(2!) + 1/(3!) — 1/(4!). Generalize 
fornletters. Hint: Use Eq. (1.7.12). Let A; denote the event that the j'^ letter gets into the 
right envelope. The probability required is P(A41 U Аз U Аз U Аа). For n letters the 
result is close to 1 — e-! — 0.632 (see Appendix A.1). The approximation is correct 
to the third figure for n > 6. 

11. In a gambling game, а player may deal 10 cards from a well-shuffled bridge 
deck and wins if, at any stage of the dealing, the number on а card is the same as the 
number of cards dealt. (Face cards are assigned the number zero). Find the probability 
that the dealer will win. Hint: This is a slightly more general matching problem (see 
Hint to Problem 10). Show that P(4;) = 4:511/521, P(A; Ax) = 42: 501/521, etc. 

12. Ten absent-minded professors, each with a hat, attend a meeting and each man 
leaves with one of these hats chosen at random. What is the approximate probability 
that no one gets his own hat? What is the probability that exactly nine men get their 
own hats? 

13. A bridge player and his partner have nine spades between them. What are the 
respective probabilities that the other four spades are split between the opponents 4-0, 
3-1, 2-2? 

14. Twelve cards have been dealt, six down and the other six showing a jack, two 
kings, a 7, а 5 and a 4. What is the probability that the next card will be a 4 or less, ace 
counting low? Hint: The six cards down do not affect the answer. 
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15. From an urn containing 10 balls, numbered from 1 to 10, balls are drawn one 
at a time and placed in a straight row of holes also numbered 1 to 10. If each ball is 
placed in its proper hole, what is the probability that there will not be an empty hole 
between two filled ones at any time of the drawing? Hint: When k holes are filled, 
there are two favorable positions for the next ball—unless the А holes include an end 
hole, when there is only one favorable position. Multiply the probabilities of success 
for. = ] to 9. 

16. Prove that the probability that some one of the four hands of cards in a particu- 
lar bridge deal contains all 13 cards of a suit is about 1 in 40 thousand millions. (More 
hands of this character have been reported than would be expected. This fact may be 
due to imperfect shuffling in actual play.) 


D ($$ 1.13-1.15) 

1. А bag contains five nickels arid а quarter, all wrapped separately so as to be 
indistinguishable. A boy ís allowed to draw one coin at a time and keep it until he 
draws the quarter, when he must stop. What is his expectation? 

2. A tosses six pennies апа agrees to pay В S6 if either six heads or six tails appear 
and $5 if either five heads or five tails appear. In every other case he takes B’s stake. 
How much should this stake be to make the game fair? Hint: И the stake is Sx, 
calculate B's expectation and put it equal to zero. 

3. A coin is tossed until a head appears and the number of tails obtained is recorded. 
Find the probability of getting x tails before the first head, and the expected value of x. 
Hint: 1 + 2r + 30 4... = (0 Р), к < 1. 

4. In ап infinite series of independent trials of an event with constant probability p 
of success in a single trial, what is the expectation of the number of failures preceding the 
first success ? 

5. From a deck of 13 spades a person draws cards one at a time, replacing each 
time, until he draws the ace. What is the expectation of the number of cards drawn? 

6. What is the mathernatical expectation of the sum of points on љ dice, tossed at 
the same time? 

7. A tosses 3 pennies and В two, and the winner is the one with the greatest number 
of heads. The winner takes the combined stakes. If there is a tie they continue tossing 
until a decision is reached. How much money should 4 put up оп a game, to each 
dollar that В puts up, to make the game fair? (A game is a.set of tosses leading to a 
decision. Theoretically, a game mighi go on indefinitely, but the probability of this 
is zero.) 

8. (The Petersburg Paradox). A tosses a coin repeatedly, having agreed to give B 
$2" if n tails appear before the first head. (Thus В receives $1 if the first toss is a head, 
$2 if one tail precedes a head, and so on). If, however, 10 tails appear in succession 
before a head, the game stops there and В receives $210, What sum should В pay A for 
this privilege? Note: The paradox arises from the fact that if the game is allowed to go 
on indefinitely, B’s expectation is infinite. This seems contrary to common sense and 
has been the subject of much discussion. Even in the limited game, B would be foolish 
to pay a sum equal to his expectation, unless he intends to play the game a few thousand 
times: See, e.g., [3] and [9]. 

9. Five cards are drawn at random from a deck without replacement, looked at, 
and then replaced. This is done 1,000 tiraes. How often would you expect to get: 
(a)5 of one suit, (b) 4 of one suit, (с) 3 of one suit, 2 different, (d) 3 of one suit, 2 of another, 
(e) 2 of each of two suits, 1 different, (f) 2 of one suit, 3 different? Hint: The expected 
number in each case is 1,000 times the probability of that combination. 


E (§ 1.16) 
1. Let А1, A2, Аз, Аа, As be sub-sets of the two-dimensional x-y plane, defined 
as follows: 
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A; = set of (x, y) such that x < 2, y x 4, denoted by (x < 2, y < 4) 

Аз = (x x2, y <1}, Аз = ix < 0,у < 4}, 

Ay = (x <0 y <1}, 45 = {0 <х<2,1<у < 4}. 
Given that P(A1) = 7/8, Р(Аз) = 1/2, Р(Аз) = 3/8. and Р(Аз) = 1/4, find P(As). 
Illustrate by a diagram. м 

2. А circle of diameter 8 in. is drawn in the interior of a square of side 12 т. А 
penny (diameter 1 in.) is dropped at random on the square, which is lying on a hori- 
zontal table. If only those cases are counted when the penny lies wholly inside the 
square, what is the probability that it is also wholly inside the circle? Hint: The center 
of the penny is equally likely to fall anywhere within the area open to it. Calculate the 
possible area and the favorable area. 

3. The floor of a large room is made of hardwood, laid in strips 1 in. wide, with 
cracks between of negligible width. A coin of diameter 1} in. is dropped on the floor. 
What is the probability that it touches three strips? 

4. A third method of drawing a chord "at random" in a circle (see Example 8 
above, 8 1.16) is to select the center of the chord at random and then draw the chord. 
When the center is determined, so is the whole chord. If the center is equally likely to 
be anywhere within the given circle, show that the answer to Bertrand's problem is $. 

5. A thin stick of length a is broken into three pieces. What is the probability that 
these pieces can be arranged to form a triangle? Hint: No piece may be longer than 
a[2. If x, y, z are the three lengths, they satisfy the condition x + y + z — a, which 
represents the part of a plane contained in the positive octant. Find the area of the 
part of this plane corresponding to the given condition. 

6. A diamond of value V is broken into two pieces. If the value of a diamond 
varies as the square of its weight, what is the expected value of the broken diamond? 
Hint: И w is the original weight, the probability that one piece has a weight x to 
x + dx is dx/w, 0 <x < w. 
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Chapter 2 


FREQUENCY DISTRIBUTIONS, FRACTILES 
AND MOMENTS 


2.1 Frequency Distribution in a Sample To facilitate applications of 
statistical inference, the data supplied by observation or experiment are usually 
organized and summarized to expose their essential characteristics. Some of 
the methods and techniques for extracting the essential information supplied by 
data, which are to be regarded as constituting a sample, will be presented in this 
chapter. Incidentally, we might point out that statistical inference is not con- 
cerned solely with data that have already been obtained. An important branch 
of modern statistical theory deals with the design of experiments and shows the 
experimenter how to arrange his work so as to be able subsequently to extract 
the maximum of information from a limited amount of data. 

Data obtained in an experiment, or as the result of an enquiry or question- 
naire, are often presented in a table. A common form is the frequency table, in 
which one column gives observed values x of a random variable Y and the other 
gives the frequency with which each of these values was obtained. Recall that Y 
is a function on the space of events to the real axis. Its domain, if discrete, is the 
set of events А; and its range is the set of distinct real numbers xj. If X is con- 
tinuous, its range usually includes all real numbers in some interval. In this 
case the range is divided up into convenient sub-intervals and the frequencies 
corresponding to the various sub-intervals (or classes) are entered in the table. 

Table 2.1 records some data obtained by D. A. S. Fraser [1] in tossing a 
crudely made plastic die. The variable X is here the number of spots observed, 
and is of course discrete. The frequencies /оЁ the six values observed in the first 
400 tosses are given. The third column will be referred to later. 


TABLE 2.1 
x f F 
1 73 73 
2 83 156 
3 80 236 
4 57 293 
5 41 334 
6 66 400 
400 
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Table 2.2 similarly presents a distribution of weights, measured to the 
nearest pound, for a sample of 1000 eight-year-old girls in Glasgow. The weight 
may be regarded as a continuous variable over a range of about 40 Ib, divided 
into 10 sub-intervals, each- of width 4 lb. Since a measured weight of 39 Ib 
would be recorded for any child between 38.5 and 39.5 Ib, the real limits for these 
sub-intervals (often called the c/ass boundaries) are 27.5 lb, 31.5 Ib, 35.5 lb, and 
so on. The central values of the sub-intervals are called c/ass-marks and will be 
denoted by x,. The upper class boundaries (ends of sub-intervals) will be denoted 


by x,. 


TABLE 2.2 

Upper Class Cumulative 

Measured Class Class-Mark Frequency Boundary Frequency 
Limits (хе) (Л) (xe) F 
28-31 Ib 29.5 Ib 1 31.5 16 1 
32-35 38:5 14 35.5 15 
36 – 39 37.5 56 39.5 71 
40 – 43 41.5 172 43.5 243 
44—47 45.5 245 41.5 488 
48 ~ 51 49.5 263 51:5 751 
52.= 55 53.5 156 55.5 907 
56-59 57.5 67 59.5 974 
60 - 63 61.5 23 63.5 997 
64 – 67 65.5 3 67.5 1000 

1000 


2.2 Cumulative Frequency Distributions It is often convenient to present the 
data of a frequency table in a slightly different form, recording for suitable values 
of x the total number of items in the sample which have an observed X equal to 
or less than х. For the discrete distribution of Table 2.1, the third column gives 
these cumulative frequencies (accumulated by adding the ordinary frequencies 
one by one from the top of the column downwards). They are denoted by F. 

For a grouped distribution like that of Table 2.2, it is usually convenient to 
choose the upper class boundaries x, as the selected values of x corresponding to 
F. There will be no measured values actually coinciding with x, (since the 
measured values are all recorded to the nearest unit and the x, to half a unit) and 
therefore the cumulative frequency gives the number of items with X /ess than x,. 


2.3 Graphical Representation А frequency table for a discrete distribution 
may be represented graphically by drawing ordinates equal to the frequency on a 
convenient scale at the various values of x. Thus Figure 12 corresponds to 
Table 2.1. The tops of the ordinates may be joined by straight lines, but these 


are merely to assist the eye and have no significance at intermediate values of x. 
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Fic. 12 FREQUENCY GRAPH FOR DISCRETE VARIATE 


A continuous distribution, grouped in classes, is usually represented 
graphically by a histogram, as in Figure 13 (for the data of Table 2.2). The 
rectangles are drawn with bases corresponding to the true class intervals and 
with heights proportional to the frequencies. With all the class intervals equal, 
as in this example, the areas of the rectangles also represent the corresponding 
frequencies. In some tables the class intervals are not all equal, and then the 
heights must be suitably adjusted to make the areas proportional to the fre- 
quencies. 

If the mid-points of the tops of the rectangles are joined by straight lines, the 
тезий is а frequency polygon, which may also be regarded as representing the 
data. The frequency polygon for Table 2.2. is shown dotted in Figure 13. 
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Fic. 14 CUMULATIVE FREQUENCY POLYGONS 


Graphs of the cumulative frequencies in Tables 2.1 and 2.2 are shown in 
Figure 14. The first is a step-diagram, similar to Figure 9a, in which at each 
value of x the frequency of values equal to or less than x is plotted. In Figure 
14b there are no measured values equal to the values at the ends of the intervals, 
and the plotted points represent the frequency of values less than x. 


2.4 Frequency Curves and Ogives The observed data usually refer to a 
sample of finite size which may be considered as representative of a very large, or 
practically infinite, population. The data of Table 2.1 refer to a sample of 400 
tosses out of the indefinitely large number of tosses which could conceivably be 
made with this particular die, given unlimited time and patience, before the die 
finally wears out. The population of eight-year-old Glasgow schoolgirls, from 
which the sample of Table 2.2 was taken, is not infinite but is certainly large. 
With a very large population and a continuous variate, we can imagine the class 
intervals as being very short while still containing many observed values in each 
class (we must suppose that the measurements are correspondingly accurate). 
The frequency polygon will then approximate to a smooth frequency curve which 
represents the distribution of the variable Х in the population. The area under 
this curve, between two fixed values a and b, represents the total number of 
individuals in the population with values of X between a and b. If instead of the 
total frequency we consider the curve as giving the relative frequency, (the 
proportion of values in this interval) the curve then represents the probability 
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density in the population. The area between Х = aand Х = bis the probability 
that a random item from the population has a value between a and 5. 

One task of practical statistics is to determine whether a given sample can 
reasonably be regarded as coming from a population of a particular kind. 
Usually some plausible assumption is made about the form of the frequency curve 
(or probability density curve) for the population, and parameters defining the 
exact shape and position of the curve are calculated so as to make it fit the 
frequency polygon for the sample as well as possible. Some test is then applied to 
find out whether the fit can reasonably be regarded as satisfactory. This process 
is called curve-fitting, and some illustrations will be given later. 

Just as the frequency polygon for a very large sample, with small class 
intervals, approximates to a smooth frequency curve, so the cumulative fre- 
quency polygon for such a sample approximates to a smooth curve called an 
ogive. If relative frequencies are used, the ogive becomes identical with the graph 
of the distribution function for the variate X in the population (see Fig. 9b). 

For a discrete distribution, the relative frequency corresponding to a 
particular value of X is an approximation to the actual probability of this value 
of X in the population sampled. If some prior hypothesis is made about these 
probabilities it may be tested by noting the agreement of the observed relative 
frequencies with the predicted values. We might, for example, use the data of 
Table 2.1 to test the hypothesis that all faces of this particular die are equally 
likely to turn up (see 8 10.2). 


2.5 The Median and Other Fractiles The median of a sample of size N is the 
value of x for which the cumulative frequency is equal to N/2. In other words, it 
is that value of x which is exceeded by half the members of the sample. For this 
reason it is often used as an average, that is, a single (more or less central) value 
which may be regarded as in some sense typical of the whole sample. For a 
small sample the median is the middle one when the itenis are arranged in order 
(if the number of items is even, the median is usually taken half-way between the 
two middle ones). 

The median of a population with distribution function F(x) is that value X for 
which F(X) — 0.5. The median is therefore easily marked on an ogive or on a 
cumulative frequency polygon. In Fig. 14Ъ, Х is the abscissa of the point on the 
polygon with ordinate N/2 — 500. If this point lies (as it will usually) on one 
of the straight sides of the polygon, the abscissa may be calculated by linear 
interpolation between the values at the beginning and end of this side. Thus in 
Table 2.2, №/2 = 500. The value 488 of Е corresponds to an x of 47.5 Ib and the 
value 751 to an x of 51.5 Ib. The value x corresponding to F — 500 is therefore 


500 — 488 
47.5 + 75] — 488 x 4 = 47.68 lb. The assumption underlying this compu- 


tation is that the items of the sample in any class may be regarded as having 
values of x which are distributed approximately uniformly over the whole class 
interval. 
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For a discrete distribution, such as that of Table 2.1, the median is not a very 
precise quantity. It is obvious that x = 3, but there are 80 individual values all 
equal to 3 and only 156 out of 400 are definitely less than 3. 

The values of x which ‘correspond to cumulative frequencies of №/4 and 
3N/4 are called the first quartile and the third quartile respectively (often denoted 
by О, and О,). One quarter of the sample is below Q, and three quarters below 
Оз. Similarly, deciles (corresponding to tenths) and percentiles (corresponding to 
hundredths) may be defined. The third decile, D3, for instance, is the value of x 
corresponding to a cumulative frequency of 3/10, and might equally well be called 
the thirtieth percentile, P30- Since these points correspond to certain specified 
fractions of the distribution they are collectively called fractiles (or sometimes 
quantiles). 

In general, the А" percentile (Px) corresponds to a cumulative frequency 
equal to k% of М. The number К is called the percentile rank of P,. It may be 
calculated by interpolation from a table such as Table 2.2. 

Fractiles are often computed in order to obtain a measure of the spread or 
dispersion of a distribution. The more a distribution is concentrated around a 
central value, the less as a rule will be the distance between О, and Оз, or 
between say P; and Роз, SO that the differences Оз — О, or Роз — P; may be 
taken as measuring the dispersion. Deciles are often used by psychologists in 
assessing the performance of a student on some aptitude or achievement test. 
If, for example, a certain student's score is known to lie between Dg and Do as 
determined from a large group taking the test, we can say that this student is 
better than eight-tenths of the group but is not in the top tenth, on this particular 


test. 


The median and the other fractiles belong to the 
this sense (as a plural word) means quantities 
observational data Гога sample and used to make 
from which the sample is drawn. The median of 


a sample is one statistic which gives information about the population, and the 
interquartile range, Оз — Qv is another. However there are some statistics 
> 


Which are better than others for the purpose of giving reliable олана», Аз 
ап average, the median h tage that it does not use all the data 


as the disadvan! À 
available in the sample. It depends only on the order of the observations and not 
directly on their actual size. The median о 


2.6 Fractiles as Statistics 
Class of "statistics," which in 
calculated from experimental or 
estimates about the population 


f the numbers 1, 7, 11, 12, 19, 26, and 
34 is 12, since this is the middle number, but any other set of seven numbers 
arranged in ascending order with 12 in the middle would have the same median. 
Statistics differ also in the extent of their sampling fluctuations, that is, in the 
from one sample to another, of the 


exte i = numerical values vary Ў Я 
nt to which their nu on. Other things being equal, a 


3 lati 
same size and drawn from the same popu E : 
statistic with the least possible sampling fluctuation will be preferred. It turns 

an has a greater sampling fluctuation than the 


out that i ituations the medi Б 
ber eripe belongs to the class of statistics known as moments, and 
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in general the arithmetic mean will be the statistician's preferred average. The 
mean has the further advantage that it lends itself more readily than the median 
to mathematical manipulation. A fuller discussion of the relative merits of 
several different types of average may be found, for example, in [2]. 


2.7 Moments Suppose that a discrete variate X can take k distinct values 
x, (i = 1, 2... k), and that f; individuals in a sample have the value x; The 
total size of the sample is У, f; = М. The r moment of X about zero is defined by 


k 
ATI m =N Sr 
i=1 


Obviously m'y = 1. The most important case is when r = 1; the statistic m’, is 
called the arithmetic mean, and it will be convenient to denote it simply by m. 
The notation X is also commonly used, and serves to indicate that the arithmetic 
mean is a quantity of the same physical nature as X. If X is a length measured in 
centimeters, then X will also be a length in centimeters. 

To calculate m for a distribution such as that of Table 2.1, it is merely 
necessary to form a column of values of fx and total it (see column 3 of Table 


2.3). B IRR 3:27. 

3. Неге m = — = 3.2 

) 400 

TABLE 2.3 
x F fx ух? Sx? 
1 73 73 73 73 
2 83 166 332 664 
3 80 240 720 2160 
4 57 228 912 3648 
5 41 205 10:5 5125 
6 66 396 2376 14256 
400 1308 5438 25926 


The fourth and fifth columns of the table give the second and third moments 


5438 25926 
respectively. Here m'; — "ue 13.595, and ж”; = EM 64.815. 


In dealing with a grouped distribution such as that of Table 2.2, the moments 
are calculated on the assumption that all the individuals in a class can be re- 
garded as having the central value (or class-mark) of that class. Although this 
is not actually true, the errors caused by grouping are not usually serious unless 
the grouping is very coarse. (For a method of correction, see $ 5.10.) 

The numerical labor of the calculation may generally be substantially reduced 
by suitable coding. If one of the class-marks (xo) is chosen near the center of the 
table, and а new variable, u, is defined by 
= (x — хо) 


(2.7.2) и с 
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where c is the class-interval, the values of и will as a rule be much simpler to 
work with than the original x values. As an illustration, Table 2.4 shows the 
procedure for calculating the first three moments about zero from the data of 
Table 2.2, where x, has been taken as 45.5 and c as 4. The values of m', (r = 0,1 
2, 3) are given in the last row of the table. The last column is a check column 
(suggested by Charlier). The check depends on the identity: 


(2.7.3) Y fiui + 1» =F fur +3 Y fu? +3У fui +У Л. 


TABLE 2.4 
х и ў Ји fiè fie flu + 18 

29.5 -4 | —4 16 —64 —27 
33.5 -3 14 —42 126 —378 —112 
37.5 2 56 112 224 —448 —56 
41.5 1 172 —172 172 —175 0 
45.5 0 245 0 0 0 245 
49.5 1 263 263 263 263 2104 
53.5 2 156 312 624 1248 4212 
57.5 3 67 201 603 1809 4288 
61.5 4 23 92 368 1472 2875 
65.5 5 3 15 75 375 648 

1000 553 2471 4105 14177 
пт, 1 0.553 2.471 4.105 


Thus 14,177 = 4105 + 3(2471) + 3(553) + 1000. 
The values of f(u + 1)? are found by multiplying each f by the cube of the u 


in the next line (since the values of 4 increase by 1 as we go down the column). 
The arithmetic mean of the original variate X is found by multiplying the 
moment т’, (in terms of и) by c and adding хо. Here 


т = 400.553) + 45.5 = 47.712 16. 


n. Variance, skewness and kurtosis The ;'^ 


2.8 Moments about the Mea 1 
ariate X is defined as: 


moment about the mean of the У 
m,=N7" у fou т) 

= 1. Also, putting г = 1, we have 

m = № yfx-m-90 

not tell us anything about the sample. How- 


s which are often used for expressing the 
king inferences about the population. 


Q.8.1) 
As before we see that o 
(2.8.2) 


for every sample, so that m, does 
.are statistic 


ever, m, ms, ns .- 
ample and ma 


Characteristics of a $ 
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Historically they have been of great importance in the development of mathe- 
matical statistics, but the related k-statistics (see $ 5.4) are now recognized to be 


more convenient. 
The second moment, mp, is often called the variance. It is given by 


(2.8.3) т = № У f(x = т)? 
= М (У fix? —2m} fx; + m^ 5 f) 


D 2 
= т, = т 


since У fjx, = Nm, and у f; = М. 

The positive square root of the variance is called the standard deviation, 
denoted usually by s. It is the most widely-used measure of the spread of a sample 
distribution. Many authors, however, prefer to define the variance as the 
second k-statistic, К», which is related to m, by means of the equation 
(2.8.4) j= aie = TP Y iei 

N-1 

This definition has some advantages from the point of view of statistical 
inference, and we shall adopt it in this book. Of course, in large samples there is 
little difference between k, and m3. 

The spread of a distribution is most naturally and easily measured by the 
range of the variate, that is, by the difference xy — X1, where the measured values 
Xj, X3... ху are supposed to be arranged in increasing order of size. The 
range, however, is not very convenient mathematically and is apt to be sensitive, 
in large samples, to sampling fluctuations. The standard deviation makes use of 
all the information in the sample, and is generally the most reliable measure of 
spread, even though it is a little more troublesome to calculate. The meaning of 
the standard deviation is perhaps most easily grasped by noting that in a good 
many common types of distribution, which are more or less symmetrical about a 
central value and tail off in both directions, roughly two-thirds of all the variates 
(in a rather large sample) will lie within an interval of x, extending for one stan- 
dard deviation on either side of the mean. If the sample consists of several 
hundred individuals, the standard deviation will usually be (roughly) one-sixth of 
the range. This is worth remembering, as a guard against gross errors (such as 
misplacing a decimal point) in the calculation ofthe standard deviation. Reasons 
for these statements will appear when the normal distribution is discussed (see 
88 3.13 and 8.21). 

The third moment, тз, and the fourth moment, m4, depend on the shape of 
the frequency polygon representing the distribution. Because of the cancelling 
of positive and negative third powers, a symmetrical distribution will have 
ту = 0. A distribution with a long tail extending out to the right will usually 
have a positive ть, while one with a long tail out to the left will usually have a 
negative ту. This is because the positive values of (x — m)? tend to outweigh 
the negative values, or vice versa. The statistic т. may therefore be used as а 
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measure of the skewness of the distribution. It is not true however that ifm, = 0 
then the distribution is necessarily symmetrical [3]. To have a measure inde- 
pendent of the units of x, it is customary to divide т; by (m3) ?. The skewness 
is then a pure number. ° 

The fourth moment, m4, divided by т> is a measure of kurtosis. This word, 
which means "peakedness," was adopted because in some common types of 
distribution a high value of m, is associated with a high central peak in the 
frequency polygon. However, the value of m, is very much dependent on the 
Shape of the tails [4] and may have little to do with any central peak. 

As we shall see later, the / statistics k, and ką are more convenient than m, 
and m, for measuring skewness and kurtosis respectively. We are usually 
interested in estimating these characteristics for a population, and the k-statistics 


give better estimates than the moments. 


2.9 Moments for a Probability Distribution In many problems of statistical 
inference, samples are drawn froma population which is assumed to be described 
by a known type of mathematical distribution function. There may be one or 
More parameters occurring in this function; the values of these parameters 
are not known but can be estimated from the samples. One of the more common 


Methods of estimation makes use of the relation between the moments ofa sample 
the present, the population will be 


and the moments of the population. For te d . 
thought of as infinitely large and characterized by its distribution function F(x). 
The random variable Y may be discrete or continuous, and in the latter case a 
density function f(x) will exist (see § 1.16). Since F(x) is the probability that 

< x, the distribution of X in the population 15 often called a probability 
distribution i 

РЕА. ака and ae dhe probability tty CRETE tikes Ше i iad 
expectation of X” is defined as 
(2.9.1) р, = ЕХ) = у хур; 

Ј 

zero. If there are infinitely many possible 


This is called (ће ғ" moment of X about 


Values of j, it is d the sum converges. s 
, assumed that г 
Тре E. moment, p’, is the expectation of X and is often called the population 
das 


mean. It will be denoted by и, Without prime ог err my pi ч cm > 
Sample statistic m’ (= m) previously oe ere between sample and 
Possible the very useful convention of distinguishing 


Population by the use of Latin letters such as т for the former and Greek letters 


Such as y for the latter. 
If X is continuous, the 


(2.9.2) ш, = A(X) - [e dx 


definition corresponding to equation (1) is 


Provided that the integral exists. 
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The expectation of (X — и)’ is called the r moment about the mean, and is 
given by 


(2.9.3) и. = E(X — uy = У (xj — up; 
5 ? 
when X is discrete, ог by 
(2.9.4) n, = E(X — py’ | (x — u)f(x) dx 
when X is continuous. Since У’; p; = | and а f(x) ах = 1, itis obvious that 


Ho = 1 in all cases. Also, 
(2.9.5) By = ЕХ – р) = Е(Х) – р = 0 


Гог any kind of distribution that possesses a mean. The lowest useful value of r 
is 2, and и, is a very important descriptive parameter for the population, known 
as the population variance. The square root of p, is usually denoted by c and is 
called the population standard deviation. It is a measure of the spread or dispersion 
of the population about its mean. The sample standard deviation s defined in 
8 2.8 provides an estimate of c. 

Using the binomial theorem, we may write 


(2.9.6) (x — m zs D» (= p (ert 


470 


This allows us to express the moments д, in terms of the moments и’, (which are 
usually easier to calculate). We have, assuming that moments up to the r'" exist, 


(2.9.7) "-—7 m n(} reao 
q-0 а 


= а = 4 5 qd, ы 
=F ЩИТ аа 


q=0 
Thus, for example, 
(2.9.8) la = 1-05 — 2ци', +1 2р 

=p- р? 
(note that и’, = and p'o = 1) 
(2.9.9) из = и" — Зи + 3p à — Шон? 
= шз — Зин + 2p? 
(2.9.10) Ha = а — изи + бии? — Ари? + пои“ 
= п — Ази + бии? — Зи 

The quantities из/о? and а/с“ are measures of skewness and kurtosis, re- 
spectively, for the population. 
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EXAMPLE | Assuming that a well-made die has a probability $ of falling 
with any specified face uppermost, the expectation of the number of spots on 
the upper face is 


6 
(2.9.11) и= Уј =$-21 = 3.5 


The variance of this number is 


(2.9.12) ш = и — R? 
sE top 
j= 
= 4-91 — (3.5)? 
= 2.917 


so that с = 1.71. Note that the sum of the integers from 1 to n is given by 
4n(n + 1) and the sum of the squares of the integers from 1 to л is 


(п + 1) Qn + 1). 
ExAMPLE2 Suppose that a continuous variate X may take values between 
0 and 2, with a density function 
f(x) =x, 0х<1 
f(x) =2-—х, раме 


The graph is triangular, with a vertex of height 1 at x = 1. 
The expectation of X is obviously 1 (from considerations of symmetry) but 
may be calculated from the relation 


2 
m -| xf(x) dx 
о 


1 2 
-| х? ах ‚| х(2 — x) dx 
0 1 


=1+4 = 1 


The variance of X is и; = H’2 — K’, Where p'2 = |, x?f (x) dx = 2. Therefore, 
Ш = 4 and д = 0.408. 


2.10 Generating Functions It is often convenient mathematically to con- 
sider functions which serve to “generate” moments or other characteristics of a 
population. When sucha function of a real variable (say Л) is expanded in powers 
of h, the coefficients of Л, h?/2!, 12/31... form the set of moments or other 
quantities. It is in this sense that these quantities may be thought of as generated 


by the function. 
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For a discrete variate the moment generating function (m.g.f.) is defined by 


(2.10.1) M(h) = E(e"*) = У ер(х) 
J 
h? 
=>. (1 *hx + хр +...) p(x;) 
J ! 
ig us v uh 


where it is assumed that the indicated sum converges. 
For a continuous variate the m.g.f. is defined by 


(2.10.2) M(h) = | i ех) dx 


-o 


provided that the integral converges, and, if so, this reduces to the same series as 
in equation (1) above. 


EXAMPLE 3 For a symmetrical die, the values of x; are 1, 2, 3, 4, 5, 6, and 
p(xj) is $ for each of them. Therefore, 


M(h) = (e^ +e™ +... + е") 
= {е'(е®* — 1)(e" — 1)7! 
Written out in powers of A, this function becomes 
desse spese.) 
where S,;=1+2+...4+6=21, 5, = 12 42? +... 4-62 —91, 
S3 =1? +2? +... + 63 = 441, etc. 
Therefore, u^, = S,1/6 = 7/2, и’. = 52/6 = 91/6, etc. 


EXAMPLE 4 For the continuous rectangular distribution specified by 


1 
f(x) ==, -а<х<а 
2а 


f(x) =0, х<-а or x»a 
we obtain 
(2.10.3) мо [^ imas 1 (eth — е) 
xim 28 а 2ah 
1 
= ah sinh ай. 
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Expressed as a power series, 
2,2 41,4 
| | a^h^  a*h 
M(h)=1 + зг + sr 
3 
so that the odd moments are all zero and the even moments are given by 


и’. =— ua =, etc. 


These are the coefficients of h?/2!, A*/4!, etc. in the above series. Since the mean 
of the distribution is zero, the moments д, are the same as the moments д’,. 
A fuller discussion of moment generating functions may be found in [5]. 


*2.11 Factorial Moments For a variate X which is discrete and takes values 
Spaced at unit intervals it is sometimes convenient to use factorial moments, 
defined by 
(2.11.1) Hey = у, (x), p(x;) 
J 

where (x), = xx, = Dx; 2)... (x; г + 1). 

For the die of Example 1 in $2.9 we have w'a) = 21/6, шу = 70/6, ia 
= 210/6, etc. The highest non-zero moment is ug = 5!. 

The factorial moment generating function (f.m.g.f.) is given by 


(2.11.2) G(h) — » (1 + А)^2р(х;) 


For the die just mentioned this becomes 


6 Р 
Gu) = y +h 
в 


7h 35. 5 85. ба, Г: 
= ee dum = Вэ += А th lp 
shi Rs dE id 25 gta 


2.12 Cumulants Ifthe logarithm (to base e) of the m.g.f. can be expanded as 
a series of powers of / (which converges in some interval including Л = 0) in 
the form 

А h? А Г ә hr 

(2.12.1) K(h) =log. M) = ihi + кз + Кз уу +... = YI 
then the coefficients x, of /'/r! are called the cumulants of the distribution (к is the 
Greek letter kappa) and K(/) is called the cumulant generating function (c.g.f.). 
The cumulants play an important role in sampling theory, as was first pointed 
out by Sir R. A. Fisher, who emphasized their advantages over moments. As 
will be seen shortly, the first cumulant is the same as д’, (the population mean) 
and the second and third are the same as и> and из respectively. The higher 
cumulants differ, however, from the corresponding moments. 
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If the origin is taken at the population mean (so that д’, = 0 and д’, is 
therefore the same as д, for all r), we have 


А" 
r! 


(2.12.2) M(h =1+ Yu 
r=2 


Also, by the definition of K(/)), 
d K(h) 1 dM(h) 


„12 = —— 
IRA dh M(h) dh 
22 "ew H^ d 
=k, +f ight + Waray др + 
From the definition of M(A) in Eq. (2), 
dM(h) h? he 
(2.12.4) —m fgg + езү +... 


so that, from Eqs. (2), (3) and (4), 


2 


h? h? hn? h? 
Lt pa ths +) (к + igh + rg te] 


h? à 


1 
=й yb Haye 


By equating coefficients of corresponding powers of A on the two sides of this 
equation, we find 


(2.12.5) ку = 0, K2 = H2, Кз = Из, Ка = Ha Ss sess 
The most common measure of kurtosis for a population is ка/К2? = (u4]/u2?) — 3. 
И the origin from which x is measured is changed from x = 0 to x = a, any 
given value x is changed to x — a. This will make no difference to any of the 
moments about the mean (since the mean will also be changed to д — a), and 
therefore will not change any of the cumulants from x; on. The first cumulant, 
however, becomes x, — a. In the above derivation a was taken as the population 
mean д, so that before the shift we should have 


(2.12.6) Ky = Ji. 


If the scale of measurement is altered so that a value previously recorded as 
x now becomes bx, the effect on M(h) is to replace h by bh. The r^ moment 
(whether about the origin or the mean) and the r cumulant are multiplied by b". 
For the f.m.g.f., a change of origin has the effect of multiplying G(/f) by (1 + р)“ 
and a change of scale replaces 1 + / by (1 + h}. 

The most important property of moment generating functions is that if 
Ху, X2... Xn ате independent variates, and if L is a linear function of the X’s 
given by L = GXi + 622+... + cy Xy (Фе c's being arbitrary constants, 
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not all zero), then the m.g.f. of L is 
РА 

(2.12.7) M(h) = [| м, (сп) 
j71 


А E ы s А 
i.e., it is the product of the m.g.f.'s for the separate variates, with c, substituted 
for h. From the definition of КЛ) it follows that 


N 
(2.12.8) K(h) = У Кс) 
j=1 


so that the separate c.g.f.’s are added. To find the c.g.f. for a sum of independent 
variates all we have to do is to add the c.g.f.'s for the variates taken separately. 
This is the principal reason for introducing cumulants. 

In 82.8 the k-statistics were briefly mentioned. These are simply related to 
the cumulants of the population. In fact, the expectation of the r'^ k-statistic k, 
is the corresponding cumulant x,. The k, are therefore often used as estimates 


of the cumulants к, (at any rate for r < 4). See $ 5.8. 


* 2.13 Characteristic Functions For some distributions the moment gener- 
ating function does not exist. If, however, we replace Л by ih (i = v= 1) in the 
definition (2.10.1) and define the characteristic function by 


(2.13.1) Ch) = У, e" p(x j) 
J 
or 
(2.13.2) C(h) -| ех) dx 


then C(/)) always exists, as a complex number, for any distribution for which the 
Р(х), or f (x), are defined. It may be written as a series: 
EOS И, 
Q.13.3) су =1 + ihe’) — ри — Sp з + tee 
arded as generating moments in the same sort of way as M(h). 


ànd so may be re 
: g to (2) there is a reciprocal relation 


It may be noted that correspondin 
Q.13.4) 2n f(x) -| e^ "*C(h) dh 

-% 
The density function / (x) is said to be the Fourier transform of C(/). Tables of 
the Fourier transform (such as those in [6]) may be useful in finding C(A), given 
f (X), or vice versa. 


If X,, X;... Xy are pairwise independent 


linear combination given by L = У, вх, 


2.14 Bienaymé's Theorem 
variates (see $ 1.10), and if L isa 
then the variance of L is 
(2.14.1) V(L) = ЮЭ с V(X) 
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If, for example, all the c; are equal to 1, the theorem states that the variance of a 
sum of pairwise independent variates is equal to the sum of their variances. 

To prove this, let E( X;) be и, ;, where the j is placed in parentheses to show that 
Hj is not a j'" moment but the first moment of the j^ variate. By Theorem 1.16, 


E(L) =) cjuo, 
1 
From the definition of variance. 
(2142) V(L) = ЕШ — E(L)]? 


2 
a ely c(X, — j| 


j 
-EXYc(X;- uj + ЕУ сх, = щу) — Hay) 
d J 


(Note that in forming the square of a sum of N quantities there are N terms in 
which each quantity is squared, and N(N — 1) terms in which each quantity is 
multiplied by a different one. These latter are the double sum above for which 
j+ К). 

Now ИХ) = E(X; — (р). Also we may define the covariance of two 
distinct variates, Y; and X,, by 


(2.14.3) C(X „ X) = E[(X ; — n) X — na] 
Equation (2) then states that 
(2.14.4) V(L) = У c? V(X) + У eje, C(X j, X) 
j j*k 


If X; and X, are independent, it follows from Theorem 1.17 that 
| E[(X j — My (Xa — ви] = E(X; — цо) E(X, — ца) 
=0 
since E(X;) = jj, by definition. Equation (4) therefore reduces to Eq. (1). 
Variates which are such that their covariance is zero are said to be uncor- 
related. Yt is sufficient for this theorem that the variates should be pairwise 
uncorrelated, and tney need not be independent in the full sense. (See $ 1.10.) 


The Pearson coefficient of correlation between two variates X; and X, is 
defined by 


C(X j, X) 
2.14.5) „=т= = =. 
( P jk ES v(x)? 

It is a pure number with range from — 1 to +1 inclusive, and is zero when X; and 
X, are uncorrelated. If we write V(X;) = с, and C(X;, X,) = рдс;о,, equation 
(4) above becomes 


(2.14.6) V(L) = У со? + Y CCP Koja 
i ЧЕК 


From the definition it is obvious that pj, = Prj- 
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The coefficient of correlation is important in problems involving two or 
more variates, and its properties will be discussed more fully in chapter 11. The 
method of calculating a sample statistic r for estimating p is given in $ 11.14. 

, 


2.15 Markov’s Inequality This states that if X isa non-negative variate with 
expectation и, and if is any real positive number, 
(2.15.1) P(X > 2) € НІА. 

То prove this, we note that the set of all possible values of X can be divided 
into two sub-sets X, and X5, where X, contains all values > 2 and X; all values 
«4. By definition, 


(2.15.2) и = E(x) -Í xp(x) dx 
0 


= [on dx + [o dx 
à 


0 


> [no dx 


А 


Since p(x) is never negative. 
But since in this last integral x is not less than A, 


(2.15.3) Е dx > | p(x) dx = АР(Х > 4). 
А 


From (2) and (3), 
и> AP(X > 2) 
Which is equivalent to (1). 

2.16 Chebyshev's Inequality (attributed also to Bienaymé) This is a de- 
duction from Markov's inequality and states that for a variate X, possessing 
first and second moments, 

4 в? 
(2.16.1) PX — uz 257: 


where E(X) = и, ИХ) = а?. | | 
If in (2.15.1) we substitute for X the non-negative quantity (X — y)?, for 


which the expectation is 0°, we obtain 


в? 
Р[(Х — п)? = и = E 


But to say that (X — И)? > 2^ is the same as to say that | — u| > A, whence the 
theorem follows. 
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ExAMPLE 5 И X is the sum of spots showing up on two good dice, it is 
easily calculated that E(X) = 7 and V(X) = 35/6. (The two dice are supposed 
to fall independently, and Bienaymé's theorem, with Theorem 1.16, gives the 
result.) d 

The probability that | X — 7| > 4 is therefore «35/96. The actual proba- 
bility is 1/6, so that the inequality is not very sharp. However, Chebyshev's 
inequality is of great theoretical importance because of its wide applicability to 
a variety of distributions. 

Several similar inequalities, usually requiring further restrictions on the 
variate, are known. See [7]. 


2.17 The Joint Distribution of Two Variates We have seen that the expec- 
tation and the variance can be readily obtained for a sum of random variables, if 
the separate expectations and variances, as well as the covariances, are known. 
If the variates are independent, the moment generating function and cumulant 
generating function for the sum are easily found from those for the individual 
variates ($ 2.12). If, however, the distribution function itself is required, the 
calculation is usually more difficult, even for independent variates. 

Let us first suppose that X and Y are continuous variates. If f (x, y) dx dy 
is the probability that at the same time X takes the value x (to x + dx) and Y the 
value y (to y + dy), then f (x, y) iscalled the joint probability density for X and Y. 
The density for X alone, regardless of Y, is 


(2.17.1) g(x) ef | f(x, у) dy 


and the density for Y alone, regardless of X, is 


(2.17.2) h(y) =| fes у) dx 

The variates X and У are independent if, and only if, 
(2.17.3) f(x, у) = g(x)h(y) 

The distribution functions for X and Y, respectively, are 

x у 
(2.17.4) G(x) -| g(u) du, H(y) -| h(v) dv 
while the joint distribution function is 
x f 

(2.17.5) BGs, »-[ | f(u, v) du dv 


If the variates X and Y are discrete, there is no density function but the 
distribution functions exist. The joint distribution function is 


(2.17.6) Ех, у) = X fO y). 


XiSx уу 
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Whether Y and Y arecontinuous or discrete, the necessary and sufficient condition 
for independence is 


бал e F(x, у) = GHO). 


* 2.18 The Distribution Function for a Sum of Two Independent Variates Let 
Z = X + Y and let P(z) be the distribution function for Z. By definition, P(z) 
is the probability that Y + Y < =. If we plot possible values of X and Y, using 
rectangular coordinates in the plane, the region corresponding to X + Y < zis 
all that part of the plane lying below and to the left of the line X + Y= z 
(see Figure 15). For any given y, the probability that X < z — y is G(z — y), 


Fic. 15 SPACE OF THE VARIATE Z=X+4+Y 


so that the required probability is obtained by multiplying G(z — у) by the 
probability for y and integrating оуег all y. 


G(z — y)h(y) dy 


(2.18.1) P(z) -| 
By differentiating with respect to 2 we obtain 


(2.18.2) p(z) -Í g(z — y)h(y) dy 


This is called the convolution of the density functions g and h. It is the 
density function for Z. 


ЕХАМРЕЕ 6 Let Y have a uniform distribution on (0, 1) and Y a symmetrical 
triangular distribution on (0,2). Then Z has a distribution, which we wish to 
find, on (0, 3), since z cannot take any values outside this interval. 
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Тре given density functions are 
g(x) =1,0<x<1 


2.18.3 
| i Wy) =y,0<y<1;h(y)=2-y,1<y<2 


Since 9(2 — у) is 1 for values of у between 2 — | and 2, and is 0 for all other 
values, 


(2.18.4) p(z) -[ h(y) dy 
-1 


Now, from the definition of A(y), p(z) has different expressions for the three 
intervals of z, namely, (0, 1), (1, 2) and (2, 3). In fact, 


=| yay eg. oes «1 

o 
1 z 

(2.18.5) к - | »dy [о-в <2 
| 1 


à 
ко = | Q- у) dy = 43 - 2)?,2 <2<3 
2-1 


The graph of p(z) is formed of parts of three parabolas, joined together. It is 
symmetrical about 2 = 14. 


2.19 Joint Distribution of К Variates The notation and definitions of $ 2.17 
may be extended to three or more variates. The variates Y,, X2... X, are 


independent if and only if the joint distribution function is equal to the product 
of the separate distribution functions, 


(2.19.1) Flis x3... хи = F,(x1)F2(x2)... F(x) 


If the variates are continuous, possessing density functions Aion fax)... 
A(x), the joint density function is . 


(2.19.2) Fes xs . x) ОО). fle) 
As we sha 
sample of k i 


(2.19.3) бы. sey) =/(х)/(х,).../(хә) 
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PROBLEMS 
А ($$ 2.1-2.8) 
1. Criticise the following statements (occasionally seen in examination answers): 
“Median = N[2," “Qi = NJẸ” 
2. Construct a histogram showing the age distribution of deaths of infants under 
one month from the following table, taken from an official publication of the United 


States Government: 


Age at Death Frequency 

Under 1 day 26,665 
1 day 8,364 
2 days 6,344 
3 to 6 days 12,375 
1 week 10,911 
2 weeks TIT 

3 weeks but under 
1 month 6,212 
78,588 


Hint: “1 day" means anything over one day but under two days; “3 to 6" means over 
three but under seven, and so on. Take the month as 30 days long. The areas of the 
rectangles, not the heights, represent the corresponding frequencies. 

3. Construct a cumulative frequency table and a cumulative frequency polygon for 
the data in Problem 2. What was the approximate probability (at the time these data 
were collected) that an infant who lived for less than a month died in the first week ? 

4. The following table gives the results of 280 tests made on a certain kind of coal 


for ash content: 
Percentage Ash Frequency 


3.0- 3.9 1 
4.0- 4.9 1 
5.0- 5.9 28 
6.0- 6.9 78 
7.0- 7.9 84 
8.0- 8.9 45 
9.0- 9.9 28 
10.0-10.9 7 
11.0-11.9 2 

280 


Calculate the median and the first and third quartiles. Find the percentile rank of an 
ash content of 8.5%. Hint: Form а percentage cumulative frequency table, corres- 
ponding to the values of хе. The percentile rank of 8.5 is the corresponding percentage 


cumulative frequency. i 
5. Calculate Ds and Рто for the data of Problem 2; also the percentile rank of 10 


days, State in words what each of these statistics means. 

6. For a set of 15 ungrouped sample measurements we find Ух = 480, У х? = 
15,735. Find the mean and standard deviation of X. 

7. For a sample of size 2, show that m2 = (ху = x3)2/2, and s (= ks?) = |х — xs]. 

8. Calculate the first three moments about zero for the distribution of Question 4. 
Then obtain the mean, variance (ke), the standard deviation (k2!/?) and the moment 
measure of skewness (nma[m2?/?) for this distribution. Hint: Use the coded variable 


u = x — 7.45. 
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9. Prove that if we have a set of Ni values x1i( = 1,2... №) and another set of 
№ values xoj(j = 1, 2... №) of the variate X, then the mean of the combined set x is 
given in terms of the two separate means x: and X» by the relation (Ni + №) ¥ = 
Nixi + №%2. 

B (§§ 2.9-2.13) 

1. A variate X has the density function f(x) = c (12x + x? — x9) 0s x «4, 
f(x) = О for all other values of x. Find c and calculate the mean, standard deviation and 
skewness for this distribution. Sketch the curve for f(x). 

2. A variate X has the density function f(x) == x/2(0 < x < D, f(x) = 
1/2(1 < x < 2) and f(x) = (3 — x)/2 (2 < x <3). Find the variance and the moment 
measure of kurtosis of X. Hint: This distribution is symmetrical about x — 3/2. 
Calculate moments for u = x — 11. 

3. А continuous variate has the density function f(x) = Cx'/s(1 xP20<x <1. 
Show that this function vanishes with infinite slope at x = 0, vanishes with zero slope 
at x = 1 and has a maximum at x = 1/4. Sketch the curve. Calculate С and find 
the mean and variance. Hint: Put х = sin? and use a reduction formula for integration. 

4. The density function for X is f(x) = cx?e-*(x > 0). Calculate c and also the 
mean and variance of X. Find the cumulant generating function for X. 

5. Find the moment generating function for the triangular distribution of Example 
2, 8 2.9. 

6. A distribution has the m.g.f. M(h) = (q + ре"), where p and q are constants 
and p + д = 1. Find the c.g.f. and the first four cumulants. 

7. From a point on the circumference of a circle of radius а, à chord is drawn in 
а random direction. Show that the expected value of the length of the chord is 4a/z 
and that the variance of the length is 2aX1 — 8/72). Hint: See Example 8, 8 1.16. 
Take the density /(0) as 1/7. 

8. If a variate can take any value from 0 to 1 with equal probability, show that its 


standard deviation is У 3/6 = 0.289. A set of two-digit random numbers such as that 
in Appendix В.1 may be regarded as giving approximate random choices from the 
interval (0, 1), number 43 for instance being read as 0.43. Use the result of Problem 
A-7 to obtain an estimate of s from 50 samples, each of size two, taken from Table В.1 
and compare with the theoretical value. 


i ; А 1 

9. The Cauchy distributionis defined by the density function f(x) — p RIS 
т (х — pm S 
— © <х< v, a > 0. Show that the mean and variance of this distribution do not 
exist, but that the mean is b if the improper integral defining it is interpreted as the 
Cauchy principal value (see Appendix A.3). 


x(B 


atl 
10. The Pareto distribution is defined by f(x) = (=) (x > B), and f(x) =0 


otherwise. Show that the rt» moment exists only if « >r, Find the expectation and 
variance of x Ша > 2. 

11. Find the median and the mode of the distribution with density f(x) = abx*"? 
( + рхо), р> 0, а> 1,0 <х < ©. Hint: The median is that value x for which 


J feo dx = INIT dx. The mode is that value € for which f(x) is a maximum. 


12. Prove that the characteristic function of the Laplace distribution, with density 
f(x) = de-l*(— oo < х < оо) is СЫ = (1+ 12)-1. Calculate the variance of this 
distribution. 

13. A discrete variate X has a distribution in the population defined by f(x) — 
0(1— 0y,forx =0,1,2..., where ĝ is a parameter with a value between 0 and 1. 


Calculate the probability of a sample of N, in which No have x = 0, Ni have x = 1, 
etc. and У № = N. 
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Find for what value of 0 this probability is a maximum. (The value so defined is 
called a maximum likelihood estimator of 0. See further т $ 6.1. Hint: The sample 
mean X = (Ni + 2№ + 3Na + ...)/N. 

14. Prove the statement (2.1 2.7). Hint: Since the variates X; are independent, the 
joint density function is the product of the separate density functions. Therefore, 

Muh) = J... [ем Уха, xo... xu) di... dew 

= f expI/tcixilfiGo) аха f expUtcexz]fs(x2) dxe . . . f expltesx ]fs(xN) ахх 
where /;(x;) is the density function for X;. 


С ($$ 2.14-2.19) 

1. If X is the number of spots showing up in a single,throw with a good die, show 
that the Markov inequality gives P(X > 5) < 0.7 and the Chebyshev inequality gives 
P(|X — 3.5| > 2) < 35/48. What are the true values of these probabilities ? 

2. A discrete variate X can take only the values x = 1, 2, 3... with probability 
2-* (this is a geometric distribution). Prove that Chebyshev's inequality gives 
РОХ — 2| 22) <4. What is the true probability? Hint: 1 + 4r + 9r? + 16r3 + 
eO = (1 +r) — rg forr < 1. 

3. For the two distributions with density functions 

(а) fc) = 1, (Ox x D) 
(b) f(x) = e-*(x > 0) 
calculate P(|x — p| > 2e) and compare with the value given by Chebyshev's inequality. 

4. If X and Y are independent random variables with density functions f(x) = 
Cixme-*/2, g(y) = Coy"e-"? respectively, show that the density function of W = X + 
Y is h(w) = Cawn*^*1e-v/?, Hint: See Appendix А.б. 

5. If X and Y are variates with joint density function f(x, у) and if И = Y/X, 


use the method of $ 2.18 to find the density function for U. Show that A(n) = I^ Ix] 


J (x, их) dx. Hint: Draw the line Y = uX in the X- Y plane, and show the areas corres- 
ponding to U < u. Note that U < и implies Y < uX if X > 0 but Y 2 uX if < 0. 
Find the distribution function for И and differentiate to get /i(u). 

6. In Problem С-5 above, suppose that X and Y are independently and uniformly 
distributed on the interval (0, 1), so that f(x, у) = 1 everywhere inside a unit square 
and f(x, у) = 0 outside. Prove that 

h(u) = 0, и<0 
h(u) = 1/2, О<и<1 
hlu) = 1/(2u°), и> 1 
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Chapter 3 


THE BINOMIAL, POISSON AND NORMAL 
DISTRIBUTIONS 


3.1 The Binomial (or Bernoulli) Distribution Suppose that all the individuals 
in a population are divided in imagination into two sets according as they have, 
or do not have, a certain attribute А. Such a division is called a “dichotomy” (a 
cutting in two)—every individual belongs either to the one set or to the other. 
The attribute A is often conventionally called a “‘success;” it may, for example, 
be “head” in a population of coin-tosses or “male” in a population of children. 
We assume that there is a definite probability 0 that an individual chosen at 
random from the population has the attribute А. This probability may be 
estimated by taking a sample of N individuals and noting the number X which 
are A's. The ratio X/N is called the relative frequency of success, and will be 
denoted by p. It is, of course, a random variable and, as an estimate of 0, is more 
reliable the larger the sample size.* 

The binomial distribution is concerned with the variation of X or p among 
samples of size N from a population characterized by the parameter 0. It is 
assumed that the probability of success is unchanged by the process of selecting 
an individual for the sample, so that we must assume either that the population 
is infinite or, if it is finite, that the sampling is done “with replacements" (see 
$ 1.12). Furthermore, each item for the sample is supposed to be chosen 
independently of all the rest. 

Under these conditions the probability that the first x individuals selected 
will all be 4’s is 0* and the probability that the next № — x will all be not-A’s 
is (1 — 0)"7*. The probability of a set of x A’s followed by a set of М — x 
not-A’s is 0*(1 — 0)"7*, and this is also the probability for any other pre- 
selected arrangement of x A's and N — x not-A’s. However, we are not inter- 
ested іп the precise arrangement of A’s and not-A's, merely in the total number 
of A's inthe sample. Hence we can combine together the probabilities for all the 


N ә " 
( ні; permutations of x successes and № — x failures, and state that the proba- 


bility of x successes, no matter in what order successes and failures occur, is 
given by 


(3.1.1) b(x, М, 0) = №) ea — Ө) —* 


*To stick to our Greek and Latin convention, the symbol for the probability should be 
т instead of 6, to correspond with the sample statistic p. But the risk that т may be misinter- 
preted as 3.14159... is serious. 


52 


3.1 THE BINOMIAL, POISSON AND NORMAL DISTRIBUTIONS 53 


where x — 0, 1, 2... N. This is the binomial distribution, discussed by James 
Bernoulli (1654-1705) in a book published in 1713, after his death. It gives the 
probability that Y has a value exactly x. It is called binomial because, if we 
write 1 — 0 as 4, b(x, М, 6) is the term containing 0* in the expansion of the 
binomial (ф + 0)". The probability that X = x is of course the same as the 
probability that p — x/N. 


TABLE 3.1 
x 512 b(x, 9, 3) 512 B(x, 9, $) 
0 1 512 
1 9 511 
2 36 502 
3 84 466 
4 126 382 
5 126 256 
6 84 130 
7 36 46 
8 9 10 
9 E 1 
512 


EXAMPLE | For a good coin we may take 0 = 4, so that the probability of x 


heads in nine tosses will be 
9! 9 
ET | 5750.05 255; 9 

b(x, 9,3) = то — x) ? 
By giving x all possible values we obtain Table 3.1, in which the probabilities 
have been multiplied by 2? — 512 so as to avoid fractions. 
The cumulative binomial probability 0.25 
15 usually defined as 


N 
(3.1.2) B(x, №, 0) = У, Би, №, 0) 920 


It is the probability of at least x 
Successes. Values for N — 9 and 
0 = } are given in Table 3.1. The 
Probability of at least six heads in 
nine tosses, for example, is 130/512 — 
0.254. Note that the distribution 0.10 
function, as defined in $ 1.16, is 1 — 

В(х + 1, N, 0). 

The binomial distribution, even 0.05 
though X is discrete, may be repre 
sented by a histogram in which rec- 
tangles of unit base, centered at X — 
0,1... N, are drawn with heights 
equal to b(x, N, 0). The histogram for 


0.15 


456789 
х— 
Fic. 16 BINOMIAL DISTRIBUTION, 0 = 0.5 


01238 
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Table 3.1 is shown in Figure 16. Since all the observations are actually at the 
centers of the intervals, there is no grouping error (see $ 2.7). 

Numerical values of b(x, N, 0) for given x, N and 0 may be calculated by 
means of logarithms of factorials. Seven-figure tables of log п! for n = 1 to 
1000 are given in Glover's Tables [1] and Biometrika Tables [2]. Extensive tables 
of the cumulative binomial distribution are now available (see [3] and [4]). 
Separate terms of b(x, №, 0) may be obtained from these tables, if desired, by 
differencing successive entries, since b(x, N, 0) = B(x, N, 0) — B(x + 1, М, 0), 
but for most practical purposes the cumulative probabilities are more useful. 
It is not necessary to tabulate values for 0 beyond 0.5, since 


(3.1.3) B(x,N,1—0)21— B(N —x +1, №, 0) 
3.2 Кесигѕісп Formula for Binomial Probabilities From Eq. (3.1.1) we see, 
by cancelling common factors, that 


b(x, N, 0) N-x+1 0 x —(N +1) 
2; . {= 
(р b(x — 1, м, 0) x 1-0 x(1 — 0) 


The ratio of b(x, М, 0) to b(x — 1, М, 0) is therefore greater than, or less than, 1, 
according as x < (N + 1)0 or x > (N + 1)0. The values of b increase with x 
as long as x is below (№ + 1)0 and decrease with x when x is above (N + 1)0. 
If x = (N + 1)0, the probabilities that X = x and that X = x — 1 are equal. 
This is the case for Example 1 when x = 5. In general the probability is a 
maximum when X is equal to the integer next below (N + 1)0. 


3.3 Moments and Cumulants for the Binomial Distribution In calculating 
moments, etc., for the binomial distribution, it is convenient to use the concept 
of indicator function, defined in $ 1.13. If the event A, is that of selecting the j'^ 
item for the sample, and if 74, = 1 when А, is a success and 14, = 0 when А, 
is a failure, the number of successes X is simply у у-у Z4} By Theorem 1.15, 
E(I4)) = P(A;), which is 0 for each item. Therefore, 


(3.3.1) п = E(X) = Y EU4) = У Р(А)) = № 
Я j 
which gives the mean of the binomial distribution. The variance c? is similarly 
obtainable from Bienaymé's Theorem (8 2.14), according to which 
(3.3.2) в? = У(Х) = boi VQ) 
where * 
V4) = Е(1„,— 0)? = E(I,,7) — 20E(1,) + 0? 


But Г, 2 takes exactly the same values as Тл namely, 0 and 1, so that Е(І, 2) 
= E(I,) = 0. Therefore, V(I,)) = 0 — 0°, and from equation (2), 


(3.3.3) в? = (0 — 07) = NO(1 — 0) 
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which is the required variance. Since p differs from X only by the constant 
factor 1/N, it follows that 


(3.3.4) * Ер) = E(X) =0 


1 | Ol — 8) 
(3.3.5) (р) =з VIX) = N 


The moment generating function for Гл, is 
M (в) = E(exp[hL4,] = У exp[h14,]PQ4) 
= ео. (1—0) + -0 
since 74, is either 0 or 1, with probabilities 1 — 0 and 9 respectively. Therefore, 
M,(h) — 1 — 0 + 0e 
The m.g.f. for X is therefore given—see Eq. (2.12.7)—by 


(3.3.6) M(h) = [мг = — 8 + ge^)" 
The cumulant generating function js 
(3.3.7) K(h) = log M(h) = № log (1 — 0 + Oe") 


0h? ОЗ 
= мов (1+0h+ pts 


If the logarithm is expanded in a series of powers of Л, the first four successive 
cumulants are found to be 
(3.3.8) ку = № = и 

к, = №(1 — 0) = 0? 

ку = №(1 — 0X1 — 20) = o7(1 — 20) 

к, = NO(1 — 001 — 60 + 602) = o^(1 — 60 + 607) 


6c? 
x = 
=o ( x) 


The skew ? syn = (1 = 20)6 and therefore is zero only when 
8-24 pes ЧЫП La ж.) of 0 the distribution has a positive skewness, which 


x : H 1/2 

diminishes, however, as N increases, since с is proportional to М. 
The kurtosis is 

1 

(3.3.9) 2—2 


Ро = 1/2, в? = М/4, and Kalk? = —2JN. 
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By differentiating with respect to 0 the expression for the r* moment Ji, 
namely, 


N [М = 
(3.3.10) iim X ( ea — 0 7*(x — NOY 
х=о\Х 
we may obtain а recursion formula for the moments, namely, 
du, 
(3.11) isa = 00 [м + | 


From this formula, starting with до = 1 and д, = 0, all subsequent binomial 
moments may be calculated in turn by giving г the values 1, 2,3... . 

A still simpler recursion formula is that for the cumulants, 
dk, 
40’ 
from which, starting with x, = №, the higher cumulants may readily be 
obtained. (See hint to Problem A-7.) 


(3.3.12) K,44 = 0(1 — 0) г> 1 


3.4 The Bernoulli Law of Large Numbers Let py be the relative frequency 
of success in N independent trials, the probability of success 0 being the same 
in all trials. Since E(py) = 0, we have by Chebyshev's inequality, $ 2.16: 


2 
с 
(3.4.1) Р(|ру — 6] > 2) < т 


Here о? = 0(1 — 0)/N, and since, for all 0 between 0 and 1, 0(1 — 0) < 1/4, 
Eq. (1) becomes 


(3.4.2) P(|py — 0| > 4) < 1(4N2?) 


For any fixed 4 > 0 and any given e > 0, we can always take N so large that 
1/A4N2?) < е. This means that for large enough N the probability that py 
differs from 0 by any fixed amount, however small, can be made as near to zero 
as we like. This is sometimes expressed as “ру converges in probability to the 
value 0." Note that this is not the same thing as ordinary mathematical con- 
vergence (see $ 1.2). The law expressed by equation (2) is a form of the weak law 
of large numbers. 

The number N given by this equation may be quite large when 4 and e are 
small. Thus if 2 = 0.01 and г = 0.001, we find that М > 2,500,000. As we shall 
see in § 3.12, the approximation of the binomial distribution by the normal 
distribution permits us to find a much smaller N satisfying the requirements. 
The value above is certainly sufficient, but not necessary. 


* 3.5 Non-Bernouli Sampling The two chief variants from the true 
Bernoulli (binomial) sampling scheme described above are (1) the Poisson scheme 
in which the probability of success Ө, at the ЛВ trial varies from one trial to 
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another, and (2) the Aypergeometric scheme (sampling without replacements 
from a finite population) in which the probability at any stage depends on the 
results of the previous trials. 

For the Poisson scheme, we have instead of (3.3.1), 


(3.5.1) и=Е(Х) =} РА) 7.0; 
7 
Also, 
(3.5.2) s^ = У, Vila) 22,0; — 07) 
If | : 
д4 x0 
М á 
and if 
1 2 2 
oo” =н? =й = 2 
we have 
(3.5.3) в? = № – №097 + 02) 


= NOU — 0) — Noo” 


. and, o? is their mean square difference from the 


Here 0 is the mean of the 0, | 1 
mean. This shows that in the Poisson scheme the variance of X is less than it 
constant over all trials. 


would be if the probability of success were i : 
In the hypergeometric scheme, suppose We have a finite population of size 


M, in which the number of “successes” is 5 and the number of "failures" is F, 
with S + F = M. The probability of success at the first trial is 0 = S/M. The 


А . [М 
total number of possible different samples of size N is (м). The number 


| . (5 Е an 
containing x successes and N — Х failures 1s С) bee ct he so that the probability 


a sample of size N is 


(3.5.4) h(x, М, М, 5) = M 


of exactly x successes in 


This may be written 
(3.5.5) h(x, N, M, =e oN — Е — N +x) MI 
When M is very large and 0 not too near O or 1 this approximates to 


b(x, N, 0). ipli i 
The expression on the right of (5), when multiplied by a constant, is equal 
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to the coefficient of м’ in the series expansion of the hypergeometric function 
F(a, B, у, и) with a = —N, В = —S,y = Е — N + 1. Hence the name of the 
distribution [5]. In calculating the higher moments of the hypergeometric distri- 
bution it is simplest to obtain first the factorial moments (see 8 2.11). The first 
two factorial moments are 


х=1 


N 
( = Y xh(x, М, М, 0) 
(3.5.6) 


N 
lia = У x(x — Dh(x, М, М, 0) 
x=2 


and from these we easily obtain 


u= 
(3.5.7) | “ш 
о? = Шау и? pu 
From Eqs. (5) and (6), 
Р S!IFINIM—N)! N 1 
(3.5.8) EN Lcd Е 
M! xci (х = 1)! (5$ — x)(N — x)!(F— Nx)! 
_ SEPLNICM = NJET 1 
= M! xo xI(S-I—x)!(N—1—x)!(F—N 14x)! 
SN (S—1)!FI(N—1)!(M—-1—-N-1)! 
M Zo xYt(S—-1—x)(N—1-—x!(F—N-1-4x)(M—- 1)! 
SN N= 1 


= 2 0% М—1,М—1,5—1) 


Similarly, we may calculate 


885) кш, SENM- NY! N 1 
M! х=» (x — 2)! (S — x)! (N — x)! (Е — N + x)! 
S(S — 1)N(N — 1) 
ИГ 
м2 (S—2) Ем —2!(M -2- N +2)! 


so x!(SC-2—x)(N—2—XxWF-N +2 +) (М — 2)! 


S(S — DN(N — 1) 2 
UU mM) „е^ >М-25- 2) 


_ NOS — 00% — 1) 
i M-1 
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From Eq. (7), we have 


NS 
3.5.10 X) = p = № =— 
( ) Lo и = NO М 
апа 
(3.5.11) V(X) = 6? = i'i; + NOU — №) 
(S — 1XN — 1) мм] 
| 
= | M-1 M 
NO 
= LL (M — S(M—N 
MM —1)‘ | : 
= NO(I ge == 
НА = 


The mean is therefore exactly the same as in pure binomial sampling, but the 
variance is less because of the factor (M — N)/(M — 1). For M large compared 
with N this correcting factor is nearly 1. 


3.6 The Poisson Distribution for Rare Eyents If in a binomial distribution 
the probability of success is very small (so that the event “success” may be said 
to be rare) but the size of the sample N is so large that the expected number of 
successes in the sample is moderate (say between 0.1 and 10), the probability of 
exactly x successes is given approximately by 

E 


„е 
(3.6.1) p(x, и) = и т 


where и = №. The theoretical distribution given exactly by Eq. (1), for all 


integral values of x from 0 onwards, is called the Poisson distribution. 


The true (binomial) probability of x successes is 


! 


(3.6.2) b(x, №, 0) = Ns)! 01 – on- 
N(N =H NEEE 1) (&) C x 
Á al N N 


Гах N-x 
ә -XJ ax]. dt Nx 2I -5) 
~ sel N N N N 
where each of the x factors N, N — 1... (N — x + 1) has been divided by one 
of the factors of N*. | 
In this equation we suppose that x i 
infinity and 0 to zero in such a Way that 


f 1 i 2 и (1-Х) ` tend to the value 1, but the 
actors 1 — т» io иш N N 


s a fixed number but that N tends to 
NO tends to the fixed value и. АП the 
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N 
limit of ( = 5) is the number e~“ (see mathematical Appendix A.1). Hence 
lim B(x, N, 0) = р(х, и). 


Noo 

4 ИЯ of the Poisson distribution might be the number of blind births 
per year in a large city, the number of occurrences of hands containing four aces i 
in an evening of bridge at a club, or the number of typographical errors per page 
in professionally typed material. The numbers of births, hands of bridge or 
typed symbols will be large, and the probability of the rare event described 
(blind birth, hand with four aces or error) is small, but there may well be a few 
such events in each instance. Considering the births, for example, we assume 
that these are independent, that the probability of a blind birth remains con- 
stant, and that the total number of births per year in the region considered is 
approximately constant. If so, the annual number of blind births in the region, 


as recorded over a period of several years, should fluctuate approximately in 
accordance with a Poisson distribution. 


3.7 Moments and Cumulants of the Poisson Distribution For the theoretical 


distribution given by Eq. (3.6.1), x may take all integral values 0, 1,2.... It 
may be noted that 


(3.7.1) У pos и) =e" Y =1 
х=0 9 x! 

as it should be, since 

o we 

yx 

The expectation of Y is 
o 
(3.7.2) Е(Х) = 2, хр(х, и) 
© и“ le^ 


= = THe! = 
Усту кесек 


Тһе moment and cumulant generating functions may be found from those 
of the binomial ($ 3.3) by writing 0 = ШМ and letting N tend to infinity. Thus 


з, : E (e^ — 1) N 
(3.7.3) M(h) aim [1 +7) 


= ехр[и(е" — 1)], by Appendix A.1 
(3.7.4) K(h) = log M(h) = ще* — 1) 


h? h? 
reme.) 
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All the cumulants are therefore equal to и. In particular, the variance is и, the 
skewness is кз/к;2/2 = и-и 3? = 7 И, and the kurtosis is x,/k;? = po. 


3.8 Tables of the Poisson Function Tables of p(x, и) for values of и between 
0.1 and 15 may be found in Biometrika Tables [2]. More extensive tables have 
been calculated by Molina and by Kitagawa [6]. The former has also tabulated 
the cumulative probabilities: 


(3.8.1) P(c, и) = È p(x, и) 


As an approximation to the cumulative binomial for moderately small 0, 
the cumulative Poisson function P(x, и) is improved by subtracting a term 
proportional to p(x — 1, и). As shown by Gram and Charlier, 


(3.8.2) B(x, №, 0) = Р(х, и) — 30(х — 1 — Ах — 1, и) 

ExAMPLE2 For М = 10, 0 = 0.1, and x = 3, we have и = 1, B(3, 10, 0.1) 
= 0.07019, P(3, 1) = 0.08030. The correcting term is —0.05p(2, 1) — — 0.00920, 
which makes the approximation 0.07110, and so improves it considerably. 


* 39 The Poisson Distribution of Random Events The clicks heard in a 
Geiger counter at a chosen location may be regarded as produced by indepen- 
dent events—the passage of cosmic rays or particles from a radioactive source 
through the counter. Also the result is practically independent of the precise 
time of observation t (assuming that the radioactive source is relatively long- 
lived). Consider, again, the arrival of incoming calls at a telephone switch- 
board. Except in special circumstances of national or local excitement, the 
calls may be regarded as practically independent of one another. The hypothesis 
that they are also independent of t is more dubious since there are slack times 
during the day, holidays, etc.) but one five-minute period will probably be very 
like another during the regular office hours, Monday to Friday. 

These are examples of sequences of independent physical events, each of 
which has a well-defined probability of occurring in an interval of time ôt (from 
tto t + ôt), where this probability, although it depends of course on бг, may be 
considered independent of t. On these assumptions it is a simple matter to show 
that the distribution of the number of events X occurring in a fixed interval T'isa 
Poisson distribution. Even though the assumptions may not be fully justified, 
the distribution seems in practice to be very nearly Poisson. Certainly telephone 
engineers have found that calculations based on this distribution are very useful 
in designing switchboards to accommodate expected telephone traffic. 

The proof that the distribution is Poisson goes as follows. Let p(t) be the 
probability that exactly one event (such as a click) occurs in a time interval of 
length t. Also let q(t) be the probability that no such events occur and r(t) the 
probability that more than one event occurs, in the interval t. Since these three 
possibilities are mutually exclusive and exhaustive, 


(3.9.1) pi) +40) + "(0 = 1 
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It is reasonable to suppose that g(0) = 1, since no events will occur in an 
interval of zero length. We assume also that 4'(1) tends to the value —a (а> 0) 
as t — 0, where q'(r) = d q(t)/dt. This means that q(t) decreases as t increases 
from zero. Furthermore, we will suppose that r(t)/t — 0 as г 0, which 
means that the probability of more than one event in the interval of length ¢ tends 
to zero even more rapidly than itself. (If the probability of one event in a very 
Short interval is small, the probability of two or more events in the same short 
interval will be of a higher order of smallness). With these assumptions we can 
show that the probability of exactly x events in the interval t is given by 


(3.9.2) р(х, at) = (at)*e^"|x! 


which is the Poisson distribution with parameter at. 

Let X denote the variate *number of events (such as clicks) occurring in an 
interval of length г” and let n be any fixed positive integer. Subdivide the 
interval into и non-overlapping sub-intervals each of length г/п. Let E be the 
event “їп exactly x of these sub-intervals just one click occurs" and F the event 
"two or more clicks occur in at least one of the sub-intervals." Then if E occurs 


and not F, the value of Y will be x; and if the value of X is x, either E or F 
must occur. That is, 


(3.9.3) EnFc(X-x)cEUOF 
By Theorems 1.6 and Ln 
(3.9.4) P(E с F) < P(X =x) < P(E U F) < P(E) + P(F) 
But P(E) = P(E ^ Е) + P(E с F) 
< P(F) + P(E с Ё) 
so that 
(3.9.5) P(E) — P(F) < P(E ^ Ё) 


From (4) and (5), we obtain 
(3.9.6) P(E) — P(F) < P(X = x) < P(E) + P(F) 
We will now show that P(F) ^ 0 as n — co, from which it follows that 


Р(Х = X) — P(E). Let F; be the event "two or more clicks occur in the і" 
sub-interval.” Then F = U; F; and 


(3.9.7) P(F) = P(U Fi) <¥ PF) 
In the notation of Eq. (1), P(F)) = r(t/n), and since this is the same for each 
subinterval, 
(3.9.8) У PF) = "(2 z r(t/n) 
7 n t/n 
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which, by the assumptions made above, tends to the value 0 as n + oo. Thus 
from Eqs. (7) and (8) we see that P(F) - 0, as stated. 

Now P(E) is the probability of exactly x successes in и independent trials of 
an event (namely the occurrence of just one click in a sub-interval of length г/п) 
of which the probability of success in a single trial is p(t/n). By the binomial 
theorem, 


(3.9.9) P(E) = (а =y 


where p stands for p(t/n). Also by the assumption regarding q(t) and its de- 
rivative, 


t) — q(0 . qt) = 1 
(3.9.10) акон Ч aO un 4 
170 1 tO 
so that 
(3.9.11) q(t) 2 1 — at + te(t) 
where e(r) — О as г 0. Applying this result to the interval г/л, we have 
3 t 1 at ф t d -) 
(on) г) 2 n n = 
By (1), 


(3.9.13) ») =1- (2) E r(*) 
тя) n 
|.) 


where b, = a — e(t/n) — r(¢/n)/(t/n), which tends to the value a as n> co. 
Therefore np(t/n) — at (a fixed, positive number) as n co. But this is just the 
condition for the Poisson approximation to hold; therefore, 
00" 

(3.9.14) Р(Е) зе“ ^ p(x, at) 

Moreover, the probability that X = x lies between P(E) — P(F) and P(E) 
+ P(F). As n > co both these extremes tend to the value p(x, at), and therefore 
so does P(X — x). This is the Poisson distribution for random events. — 

The quantity а is the expected number of events (clicks) in unit time; it may 
be estimated from the ratio N/T where N is the total number of clicks occurring 


in a fairly long interval T. 


64 INTRODUCTION TO STATISTICAL ПЧЕЕВЕМСЕ 3.10 


3.10 The Normal Distribution as an Approximation to the Binomial If the 
binomial probabilities b(x, М, 0) are plotted against x for different values of N, 
it will be found that, as N increases, the histograms so drawn approximate more 
and more closely to a symmetrical bell-shaped curve known variously as the 
normal, or Gaussian, curve, or the curve of error. The normal distribution, 
represented by this curve, plays a central part in statistical theory. Since the 
range of the binomial variable X, and its expectation, both increase with N, the 
histograms get wider and flatter and move further to the right as N increases 
(see Figure 17, which shows outlines of the histograms for 0 = 4, N = 9, 16 and 
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25). To avoid this it is convenient to use instead of X the standardized variate: 
(3.10.1) Z=(X — w/o 


which is the difference of X from its expectation expressed in units of the standard 
deviation, and at the same time to multiply the ordinates b(x, N, 0) by c. Since 
с is proportional to the square root of N, the effect of the change of scale is to 
compress the histogram horizontally and extend it vertically, and the change 
of origin keeps the center at z = 0 for all values of М. An outline of the histo- 
gram for № = 50 is shown in Figure 18, along with the limiting normal curve. 
Almost the whole of the distribution lies within about three standard deviations 
on either side of the mean, between z = +3. The approximation to the normal 
curve is much better, for moderate values of N, when 0 is near 0.5 than when it 
is near 0 or 1. 


The probability that X = x is given by the binomial expression 


P(X = x) = b(x, N, 0) = s gs 


where ф = 1 — 0. Taking logs (to base €), we have 


(3.10.2) log P = Іор №! — log x! — log(N — x)! + x log 0 + (N — x)log ф 
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With the use of Stirling's approximation for the logarithm of п! (see Appen- 
dix A.2), namely, 
(3.10.3) log n! є (п + 4)logn—n + Іов (2л), 
Eq. (2) becomes 
(3.10.4) log Р= (№ + log М — (x + Dlog x 
—(N — x + о (УМ — х) — 4 log(27) + x log 0.4 (N — x) log @ 


18 15 17 19 21 23 25 27 29 31 33 35 387—* 
-4 -8 -2 ә} 0 1 2 3 —z 
Fic. 18 STANDAR 
w change to the standardized variate, putting z — (x — mr 


DIZED BINOMIAL DISTRIBUTION AND NORMAL CURVE (0 — 0.5) 


If we no 
= (x — NOYN0Q) "?, so that 
(3.10.5) x = №0 + (N09) "z 
and 
(3.10.6) N — x = № - (N00) "z 


we find, from Eq. (4), 
(3.10.7) log P x —3[log N + log 0 + log ф + log(27)] 

-[N0 +} + (NO¢)'/?z] log[1 + 2($1№0)*/?] 

—[NÓ 4-3 — (N09)'"z] log[1 — 2(01%ф)'?] 
ithms іп series and arranging the terms on the right of 
1/2, we finally obtain after some manipulation 

2 

(3.10.8) log P ~ —3 108(27№0ф) — 3 


3 
sin ПО — (07 + фт ПФ — COT +... 


Expanding the logar 
Eq. (7) in powers of N^ 
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where the unwritten terms are of order N~'. For large N the terms of order 
№12 may be neglected (and they vanish identically when 0 = ф = 4). If so, 


„а 
log P x —11ор(2л\/0ф) — EI 
so that 
! e Be l мерей (2в?) 
NV2nN0Q ON 2n 


(3.10.9) Ps 
The limiting form for øP is therefore given by 


1 i 
(3.10.10) lim oP = —— e7*?/2 
Nm v 2n 


The function on the right is the standardized form of the normal distribution 
and will usually be denoted by $(z). It is tabulated in Appendix B (Table 2). 
For more extensive tables see references [7] and [8]. 


3.11 Approximation of the Cumulative Binomial by the Cumulative Normal 
Distribution The distribution function for the standardized normal distribution 
is 


(3.11.1) Ф(2) =f ф(и) du 


where the integral is improper but converges (see Appendix A.3). This function 
represents the area under the standardized curve from — co up to the given 
value 2. It is tabulated in Appendix B and in references [7] апа:[8], although some 


of these tables give instead of Ф(2) the integral is o(u) du, which merely differs 
from d(z) by 0.5. (Because of the symmetry of ф(и) about и = 0, | ф(и) du 


15 one-half | d ф(и) du and this latter integral is equal to 1 (see Appendix A.7). 

The binomial probability that X 7 x is given exactly by B(x, N, 0). This is 
approximately equal to the probability that Z > 2, where Z has a normal 
distribution and 2 = (x — џ)/с. However, because the binomial distribution is 
discrete while the normal distribution is continuous, a better approximation is 
given by putting z = (x — (1/2) — џ)/о. In Figure 19, the probability B(x, N, 0) 
is given by the sum of the area of the shaded rectangles, and the bases 
of these rectangles extend from x — 1/2 on. It can be shown [9] that the error 
involved is less than 0.140/c. 

Various attempts have been made to give a better approximation to the 
value of z which is such that 


(3.11.2) B(x, №, 0) = 1 — Ф(2) 
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A simple approximation is that of Freeman and Tukey [10], namely 


3. 7 f 
(3.11.3) zx [Vad - 9 - VN + 09] 


an i 
d a more elaborate one is due to Camp and Paulson, 


(3.11.4) discs 
d la 
where asg- y"! -[9-(N-x mys —х+ nay" 
x(1 — 0) 


Кн 2 
b-x x(N-x40)'^ 9 d ^ 
x(1 — 0) 


BINOMIAL PROBABILITY 


Fic, 19 CUMULATIVE 


EXAMPLE 3 For М = 35,0 = 0.30 and x = 15, the true value of B(x, М, 0) 


is 0.07307, and the true corresponding 2 from Eq. (2) is 1.4533. The first approxi- 
5 — 10.5)/2.711 = 1.4755. The value given by 


mation, (x — 1/2 — р/с, is (14. 
ud К is 2.03 — V6.3] = 1.4608, and that given by Eq. (4) is (1.3829)/ 
.10054)!/2 = 1.4538. This last one 15 extremely accurate. 
To approximate the binomial probability that x, < X < x, namely, 


AG N, 0) — B(x, + 1, N, 0), we can in the same way use 21 = (xı = 1/2 — uo 
nd 2, = (x, + 1/2 — w/e and take as the first approximation 


(3.11.5) р(х, < Х < x3) «T ф(и) du 


n be obtained by using equation (2) with (3) or (4) 


Closer approximations ca 
parately. 


f 
ог B(x,, N, 0) and B(x, + 1, № 6) se 
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3.12 Bernoulli’s Law of Large Numbers (Using the Normal Distribution) 
We saw іп $ 3.4 that, from Chebyshev’s inequality, 


Х _ Я 2-1 
(3.12.1) (|5 7 E i) < (AN3?) 


which means that by making N large enough we can be practically certain that 
X/N will differ from 0 by as small an amount as we please. However, if we use the 
normal approximation to the cumulative binomial we can achieve the same result 
with a much smaller value of М. 

If x, = N(0 — 2) and x, = N(0 + 4), x, and x, need not be integers when 
А is arbitrary. The probability that |Y/N — 0| < A, which is the probability 
that X lies between x, and x3, is, for large N, close to the probability that Z 
lies between z, and z}, where 2, = – №/с and 2) = №/с. This proba- 
bility is ji $(u) du = 2Ф(2,) — 1, since z, = —22. The required probability 
P(|X/N — 0| > 4) is therefore 1 — (Q6(z;) — 1) = 21 — Ф(2,)). In order to 
make this less than some fixed є for a given 2, we have to choose Z so that 
Ф(2,) > 1 — e/2. This provides a lower limit for N. 

Thusif 2 = 0.01 and c = 0.001, we find that z, > 3.29. Sincec? = N@(1 — 0), 
we have then 


(3.12.2) №24 > 3.29[0(1 — 0) 1/2 


For all values of 0, 0(1 — 0) < 1, so that this inequality will certainly be 
satisfied if №!/24 > 1.645, and therefore if N > 27,060. This is considerably 
better than the bound on N (2,500,000) obtained in $ 3.4 by the useof Chebyshev's 
inequality. 


EXAMPLE 4 If0 = 3 and М = 600, what is the probability that the relative 
frequency of success will differ from 3 by less than 0.01? 

Неге с = [М№0(1 — 6]? = 12 and № = 360. Also P(|X/N — 0.6] < 0.01) 
= P(354 < X < 366) = P(z, < Z < 2,), where Z = (X — 360)/12, 2; = — 0.458, 
2; = 0.458. This last probability is 2Ф(0.458) — 1 = 0.353, which is the 
probability required. 


3.13 Properties of the Normal Distribution The probability density for the 
standardized normal distribution is 


(3.13.1) ф(2) = Qn) !"7e77?, о <z < o 


This is an even function, $(2) = Ф(— 2), with a maximum value 0.3989 at 
2 = 0. The quartiles фз and ф, are at z = 0.6745, since 


0.6745 
(3.13.2) | $(2) dz = 0.25 
0 
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The probability is therefore 0.5 that z has a value between — 0.6745 and 
+0.6745. There is an even chance that a normal variate lies between the mean 
plus 0.6745 times the standard deviation and the mean minus 0.6745 times the 
standard deviation. 

The moment generating function for the standard normal distribution is 


(3.13.3) M(h) -| e" (z) dz 


© 
=n | ge go 
-% 


If we change the variable of integration from z to и, where и = 2 — h, this 
becomes 
(3.13.4) M(h) | ета du 

= ume 


„выя 
aleph +5, (800) +... 


The moments about the mean are therefore и: = 1, из = 3, дв = 15, etc. The 
odd-order moments are all zero, as is obvious from the symmetry of the distribu- 
tion about z = 0. 

The cumulant generating function is 


(3.13.5) K(h) = log M(h) = 4h? 
The only non-zero cumulant is therefore 
(3.13.6) k=l 
which expresses the fact that the variance is unity, as of course it must be for a 
standardized distribution. For the non-standardized normal distribution, with 
mean д and variance g?, we have, by $ 2.11, 
(3.13.7) Ky =H, к =o? 
The great simplicity of the system of cumulants for the normal distribution is 
one reason for the importance of this distribution in statistical theory. | 
The normal curve is asymptotic to the z-axis as 2 > +0. Practically, it 
almost touches the axis beyond z = +4. Table 3.2 gives the proportion of 


area beyond z = +2 fora few selected values of zo, and therefore represents the 


probability that a standard normal variate will have a value outside the given 
interval. This table will be useful later on in problems of estimation where the 


variate concerned may be regarded as approximately normal. 
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3.14 Probability Graph Paper The graph of the distribution function Ф(2) 
is a roughly S-shaped curve, looking something like the cumulative frequency 
polygon of Figure 14b, but of course smooth throughout. If the scale on the 
graph paper is properly adjusted, this curve may be straightened. In Figure 
20, the data of Table 2.2 are plotted on special probability graph paper [11]. 
The scale along the axis of x is uniform, but on the axis representing percentage 
cumulative frequency the scale is compressed in the middle and extended near the 
top and bottom so that the polygon of Figure 14b becomes almost a straight 
line. The points marked in the diagram are the values of 100F/N, ог, in this 
case, F/10, plotted against corresponding values of x,. The fact that these 
points apparently lie close to a straight line is good presumptive evidence that the 
distribution of the variate X (in the population from which the sample is taken) 
is approximately normal. A method of testing this presumption will be dis- 
cussed later. 


TABLE 3.2 
Zo P(|z| > zo) Zo P(|zo| > zo) 
1 0.3173 0.6745 0.5 
1.5 0.1336 1.2816 0.2 
2 0.0455 1.6449 0.1 
2.5 0.0124 1.9600 0.05 
3 0.0027 2.3263 0.02 
3:5 0.00046 2.5758 0.01 
4 0.00006 3.2905 0.001 


If a straight line is drawn by eye as.evenly as possible between the plotted 
points, one can make a quick rough estimate of the median, quartiles, etc. for the 
distribution (by noting the values of x corresponding to percentage cumulative 
frequencies of 50, 25, 75, etc.). One can also estimate readily the probabilities 
that chosen values of x will be exceeded in the population. 


* 3.15 The Angular Transformation for Binomial Variates If a variate is 
thought to be binomial, transformations such as those given in equations (3.11.3) 
and (3.11.4) will aid in the approximation to normality. That is, the quantity 
on the right hand side of each of these equations is approximately a standard 
normal variate. However, a different transformation is often used with a 
different purpose in mind, namely to make the variance more nearly constant 
(independent of 0). The so-called angular transformation which is appropriate 
here, and which was suggested by Fisher, is 


(3.15.1) A = т- (ру?) 


where р = X/N, the observed proportion of successes, and А is the angle (in 


degrees) whose sine is p'/*. A table of values of А for given p may be found 
in [12]. 
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Fic. 20 DATA OF TABLE 2.2 PLOTTED ON PROBABILITY GRAPH PAPER 
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As we have seen, the variance of p is 0(1 — 0)/N, but it turns out that the 
variance of A is approximately 821/N, whatever 0 may be. The angular trans- 
formation is therefore often advisable in testing experimental results by the 
method of analysis of variance, to be discussed later, in which constancy of the 
variance in different circumstances is usually assumed. А decidedly non- 
rigorous but simple proof of the effect of this transformation on variance goes 
as follows: 

А small variation 6A in A is related to the corresponding variation dp in p by 
the equation 


dA 
(3.15.2) ФА а — др 
dp 
180 1 | 
Tm ЗП ру Р 
so that 
2 2 2 
(3.15.3) (5A)? x (©) (ӧр)2 _ 821(5p) 
p-p) p-p) 


Now the variance of p may be regarded as the expectation of (бр)? where др 
is a sampling fluctuation about the mean, and a similar interpretation holds for 
the variance of А. Therefore 


821 
(3.15.4) ИА) x —— V(p) 
p(1— p) 
Since the variance of p is approximately p(1 — p)/N, we obtain 
821 
315.5 == 
( ) V(A) N 


A more precise argument [13] shows that as № > оо the distribution of A 
does in fact tend to a normal distribution with mean sin^! 01/2 and variance 


821/N. 
A slight modification of the transformation, namely, 
1 х ү X +11 
3.15.6 A —-|si "(XJ i “| ) 
nn js Nai ' Ui; 


gives a quantity A whose variance is within +6% of 821/(N + 4) for almost all 
binomial distributions with № > 1. 


* 3.16 The Square Root Transformation for Poisson Variates If XY is a 
Poisson variate with mean и, we know that V(X) = и and the skewness of X, 
given by кз/к,3/2, is u~*/?. A transformation which serves to stabilize the 
variance approximately is 


(3.16.1) ү= Хх! 
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ИР 1 
By a similar argument to that in $ 3.15, òY = 3X ^'^ 6X and (57)? = ах OX” 
So that V(Y) = a ИХ) = 5 since X = jt (which is the expectation of X). 


More precisely [13], if 
(3.16.2) yuides ime 


tends as д — oo to a normal distribution with 


the distribution of Y — Vu + * 
Ү) = в”. Actually, for 


mean 0 and variance }. Ша = 0, this means that E( 
large и and а = 0, 


EY) x(u- Ия $ 


and the skewness of Y is — 1/0241?) approximately, which indicates that the 
normality is not greatly improved by the transformation (the skewness is halved 


numerically). | 
Bartlett [14] found that if = 4, the variance is usually considerably nearer 


to 1 for moderate values of и than if = 0, and he recommended the use of the 
transformation Y = (x + 2): Johnson and Anscombe [15] recommend 
a= 3 

3. 
. Anscombe [16] has pointed out that 
Interested in normalizing, is 


(3.16.3) ү= x8 


а better transformation, if we are 


1/3 Г) - 
The variance of Y is about 4 — and the skewness 15 of order u^). The 


so that if we want a normal approximation to the 


expectation of Y is (и — 4)” À 
и), we may write 


Poisson cumulative function P(c, 
(3.16.4) Plc, и) 21 — Ф(2) 


Where 


2 2/3 — (y — 52° 
(3.16.5) , UP T 


The term c — 4 is used instead of c as a correction for continuity, like that dis- 
cussed in § 3.11. 

The true probability that X > 6 is P(6, 4) 
= 0.2149. The value of z given by Ea. (5) 82 = 16.5} — (23/6)/2]/102/3)-4:/] 
= 0.7935 and the corresponding probability is 0.2138, which is a fairly good 


approximation to the truth. 


EXxampte 5 Let u = 4 ¢ = 6. 
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PROBLEMS 
A. (88 3.1-3.3) 
1. In a binomial distribution with 0 = 2, calculate the probability that the number 
of successes in six trials is either 1, 2 or 3. 
2. If ten “good” coins are tossed, what is the probability of (a) at least three heads, 
(b) not more than three heads? 


3. Show that the greatest value of (2) for positive integral values of x occurs 


when x = n. What does this tell us about the binomial distribution with 0 = 1? 

4. An antiaircraft battery in England during World War II had on the average 
three out of five successes in shooting down *'flying bombs" that came within range. 
What was the chance that if eight bombs came within range, not more than two got 
through the barrage without being shot down? 

5. A and B play a game in which 4° chance of winning is $. In a series of eight 
such games, supposedly independent, what is the chance that A will win at least six? 

6. If the probability of success in a single trial is 0.01, how many independent trials 
are necessary in order to have the probability of at least one success greater than 4? 
Hint: Find n so that 1 — (0.99)" > 4. 

7. Prove the relation of Eq. (3.3.12) for successive binomial cumulants. 


" K(h) ак, а е^ — 1 
Hint: кг = |а that — = i — БИР 
"USA | аһ" Ls за tha 49 i E: ( —O+ |. 


K(h) а ber 
Also, к = | d+! —— = | MN 
"OSA | dca]. vss ( =O: =) |... 


8. In a series of n trials of a binomial distribution the numbers of successes and 
failures are m and лг. Calculate the covariance of nı and лә and the variance of the 
difference m/n — ne[n. Hint: C(m, пг) = E(mnz) — Е(т)Е(пг). For second part, 
see $ 2.14. 

B. (88 3.4-3.5) 

1. If 1,000 trials are made of an event with probability of success } in each trial, 
find the Bernoulli upper limit for the probability that the proportion of successes will 
differ from 2 by as much as 0.05. 

2. How many trials must be made of an event with binomial probability of success 
$ in each trial, in order to be assured (by the Bernoulli law) with probability at least 
0.9 that the relative frequency of success will be between 0.48 and 0.52? 

3. Suppose that in a Poisson sampling scheme the probability of success on the 
Л trial is always either 0 or 1, and that in N trials there are Ni cases of 0, = 0 and № of 
0; = 1. Show that the formula of Eq. (3.5.3) for the variance of the number of successes 
reduces, as it should, to zero. 

4. If px is the proportion of successes in N independent trials, the probability of 
success at the j'^ trial being бу, prove that Pw converges in probability to 0, where 


1 р 
9 = у> 0, Hint: Show that Р(|рх — 6| > А) сап be made arbitrarily small for any 
fixed A > 0. 

5. Two persons are picked at random from a group of five persons, consisting of 
three men and two women. Let X represent the number of men in the sample picked. 
Write down the probabilities for the possible values of X. Calculate the expectation 
and variance of X and so verify the formulas of Eqs. (3.5.10) and (3.5.11). 

C. ($$ 3.6-3.9) 


1. A Poisson distribution is such that the probability is the same for X — 1 and 
for X — 2. What is this probability ? 
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2. A liquid culture medium contains on the average м bacteria per milliliter. Many 
samples are taken, of 1 ml each, and the total number of bacteria in each sample is 
counted. Assuming that the distribution is Poisson and that 10% of the samples are 
free from bacteria, estimate p., 

3. If on the average the proportion of defective fuses in a large consignment is 
0.015, calculate the approximate probability that in a box of 200 fuses there will not 
be more than 2 defective. 

4. A seed distributor finds that on the average 5% of his seeds will not germinate. 
He puts them up in packages of 100 and guarantees 90% germination. Find an 
e expression for the probability that a given package will violate the guaran- 
ee. 

‚ 5. Suppose that the number of telephone calls received by an operator in a par- 
ticular 5-min interval, say from 9:30 a.m. to 9:35 a.m., is a Poisson variate with mean 
4. Find the probability that on a future working day the operator will receive in this 
interval of time (a) not more than one call, (b) six or more calls. 

6. A retailer with limited storage space finds that, on the average, he sells two boxes 
Of parrot food per week. He replenishes his stock every Monday morning so as to 
start the week with four boxes on hand. What are the probabilities that (a) he sells his 
entire stock in a week, (b) he is unable to fill at least one order? With how many 
boxes should he start the week so as to have a probability at least 0.99 of being able 
to fill all orders? Hint: Assume a Poisson distribution of sales with mean 2, and find 


the probabilit 
y of x or more sales. | А 
7. Show that if X is a Poisson variate with mean p, then E(X*) = РЕСХ + 1). 
an for a variate X is defined 


8. The mean absolute deviation (m.a.d.) about the me: d 
as E(|X — ||). Show that for a Poisson variate with p. — 1, the m.a.d. is 2/е times the 
Standard deviation. 

_ 9. Prove that the sum of two indepen 
is Poisson with mean pı + us. Hint: Use 


dent Poisson variates with means pi and p2 
Eqs. (2.12.8) and (3.7.4). 


D. % 3.10-3.14) 
- From the tables in Ар 
(2.07) and Ф(—1.63). 


2. (a) Determine z so that MO du = 3; (b) I 


Use linear i i : lues 
‘ar interpolat n the tabular values. ER 
F 3. А variate X is distributed normally with mean 12 and standard deviation 2. 
ind the probability that X lies between 9.5 and 13.0. | 
4. A sample of za 1500 is normally distributed with p = 75 and ø = 10. Find (a) 
the value of X such that the corresponding cumulative frequency (F) is 800, (b) the 


Number of items in the sample with X < 80. 


5. The median of a normal distribution is 89.0 an 


is the st отат | 
‚6. pea rbi ap company operating à subway uses pee of light bulbs 
ìn its underground stations. On the morning of January 1, 1960, t e ri in into 
Service 5.000 new bulbs. Assuming that the distribution of length of Ше tor these 
bulbs is normal, with a mean of 50 days and а standard deviation of 19 days; how many 
9f them would need to be replaced by en = mmn Sa 1960? 
9wm ? unt January . | | 
p^ A p dues P s is divided into three classes E asa сеп. 
length-breadth index X is (a) under 75, (b) from 75 to S0 Ooye 80. Thee aE ed, 
Tespectively, dolichocephalic (long-headed), mesocephalic (me ШШ? аш М rachyce- 
Phalic (short-headed) ‘Assuming that the distribution of X is noma m e at out of 
Skulls examined the numbers in the three classes are 29, 19, and 2, find the mean and 


pendix B.2, write down the values of $(1.75), $(—0.64), 


f D(z) = 0.43, calculate z. Hint: 


d the first quartile is 75.5. What 
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standard deviation of X for this collection. Hint: Find the values of z for Х =- 75 and 
80 from the corresponding values of (z). | 

8. The mean height of soldiers in a regiment containing 1,000 men is 68.22 in., with 
a standard deviation of 3.29 in. If the distribution is normal, how many men over 6 ft 
tall would you expect to find in the regiment? 

9. Verify that the points of inflexion of the standard normal curve are at 2 = +1, 
and that the tangents to the curve at these points meet the z-axis at z = +2. Hint: At 
the points of inflection the second derivative of $(2) vanishes. 

10. Use Eq. (3.11.2) to find an approximation to the probability of at least 7 
successes in 20 independent trials when the probability of success in each trial is }. 
Also calculate the approximations given by Eqs. (3.11.3) and (3.11.4) and compare 
with the true value, 0.2142. 

11. Calculate an approximation to the probability that in 1,000 binomial trials, 
with probability of success + in each trial, the number of successes will be outside the 
limits 481 to 519 inclusive. What is the upper bound on this probability given by 
Chebyshev's inequality ? 

12. A normal distribution with mean џ and variance о? is truncated at Y = a and 
all values less than a are discarded. Show that the mean of the truncated distribution 


is aty + оф(о)/[1 — Ф(о)], where « = (a — w/o. Hint: INO dv = — [Г = f(a). 


13. Give an alternative “proof” of the theorem that the limiting form of the stand- 
ardized binomial variate, as N — co, is the standardized normal variate, by showing that 
the c.g,f. of the binomial tends to that of the normal as N— co, and assuming that a 
distribution is uniquely determined by its c.g.f. Hint: The C.g.f. for the binomial is 
N 108(1 — 9 + деп). For the standardized binomial we must subtract ph/o and replace 
€^ by ема, where u = № and o? = NÓ(1 — 6) (see $ 2.12). Expand the logarithm in 
powers of Л and show that K(h) — А2/2 as N — oc, 

14. Use Stirling's approximation (Appendix A.2) to prove that if the probability 
of success in a single trial is 3, then in a series of n binomial trials the probability of 
exactly x successes is (2/71)? exp[—2(x — n[2)*[n], neglecting terms of order I/n. 

Е. ($$ 3.15-3.16) 

1. Use the method of $ 3.15 to show that if the 
proportional to X? then the transformation Y = 
imately constant variance. 


2. Show that if the variance of Y is approximately proportional to (1 — X?)? then 
а suitable transformation for producing a variate with approximately constant variance 
is Y = } logi + ХЕ — X). 

3. The following table gives the perci 
treated in various ways, there being fiv 
angular transformation, Ед. (3.15.1), 
the estimated variance (k2) for each t 


the angular variate А. Does the variance appear to be more nearly constant (as between 
treatments) after the transformation than before? 


variance of X is approximately 
log X produces a variate with approx- 


entage damage by boll weevils on cotton plants 
е replications for each treatment. Use the simple 
to obtain corresponding values of А. Calculate 
reatment, both for the original variate X and for 


Treatments 
Replications 1 2 3 4 5 
1 18 17 27 34 42 
2 18 14 12 27 42 
3 14 14 17 23 25 
4 10 8 12 26 24 
5 


11 9 11 15 22 


Ш 
[2] 
[3] 


[4] 


[5] 
[6] 


[7] 
[8] 


[9] 
uo 
[11] 
[12] 
[13] 


[14] 
[15] 


[16] 
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Chapter 4 
OTHER PROBABILITY DISTRIBUTIONS 


4.1 Reasons for Studying Probability Distributions The main purpose of 
studying distributions, such as those in this chapter and in Chapter 3, is to be 
able to draw inferences about populations which we can sample. An empirical 
sampling distribution will be more or less irregular, but its form may suggest that 
the population distribution is closely normal or Poisson or of some other 
well-known mathematical type. In the next chapter we shall discuss the pro- 
cedures for findingthe parameters of a theoretical curve to make it fit theobserved 


distribution as closely as possible. Once having done this, we can proceed to 


make mathematical deductions about the population and perhaps test these by 
further observations. 


can be calculated, nowadays us 
For reasons of mathematical 


4.2 The Rectangular (Uniform) Distribution Th 
tioned in Example 4, 82.10. The continuous variate Y has the probability 
density f(x) = (8 - К a < x < B, and f(x) = 0 for x > Band x < о. The 
density function is discontinuous at x = g and X = В (see Figure 21). Moments 
of all orders may easily be calculated. 

Ап interesting pro 


bution function of Y, 
(4.2.1) 


is has already been men- 


perty of continuous variates is that if F(x) is the distri- 
and if we make the transformation 


Y= F(X) 
then Y has a uniform distribution, with f| (») 


; niform | = 1, on the interval (0, 1). 
Since any distribution function has a rang 


е from 0 to 1, the values of Y must 
78 
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obviously be confined to the interval Oto 1. Let G()) be the distribution function 


E Y, and let a and b be values of Y corresponding to values a’ and b' of X. 
en 


а= Еа),  b-F(b) 


The probability that Y lies between a and 5 is the same as the probability 
that X lies between a’ and b', which is F(b^) — F(a’). Therefore, 


(4.2.2) G(b) — Gla) = F(b') — Ва) = b — a 


Where 0 <а <Б « 1. 


| 


f(x) 
B-aj“ 
о 5 В 
x— 
Fic. 21 RECTANGULAR DISTRIBUTION 
If бу + Ay) = GO). | 
We replace a by y and b by y + A y, this becomes acf NE — 1, or, 
In the limit as Ау» 0, 
(4.2.3) 460) _|, фейл 
dy 
The derivative of G(y) is the density function of Y, namely, g(y), so that 
(4.2.4) gy)=1, 0<У< 1 


у is rectangular. The transformation 
ility transformation. ]t is sometimes 
inuous distributions, to be able to 
mathematically speaking, as the 


Which shows that the distribution of 
SXpressed by Eq. (1) is called the probabi 
а, in proving general theorems about cont 
Tansform them to so simple а distribution, 
Tectangular one. 

rmed Variate If u(x) is a given 
a random variable X, then u may 
able U. The distribution function 


fi 43 Distribution Function of а Transfo! 

unction of x, and if x is a value assumed by 
€ regarded as a value of a new random Уап 

of U is 

бад си) = РО <") 


-Í f(x) dx 
u(x) Su 
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where f(x) is the probability density for X and the integral is taken over all 
values of x such that u(x) < и. The density function g(u) is found by differentiating 
С(и) with respect to u. If the variate X is discrete, the integral in Eq. (1) must be 
replaced by a sum. 


EXAMPLE] If f(x) = 1,0 <x < 1, and u(x) = —2 log x, then u(x) < и 
if and only if x > e~“/?. Therefore, 


1 
G(u) -Í dx = 1 ет"? 


eua 


and 


(4.3.2) glu) = }e™?, 0О<и<о 
This distribution is illustrated in Figure 22. 


FIG. 22 EXPONENTIAL DISTRIBUTION 


EXAMPLE2 Suppose f(x) = (2/9)(х + 1), —1 € x < 2, and u(x) = х2. 
The range of u is 0 < и < 4, but in the interval from 0 to 1 there are two values 
of x corresponding to any given u (eg, u = 4 for x = 4 or x = —}), Since 
x = уи and runs from — 1 to 2, the interval of x corresponding to u(x) < и 


is from — и to Ju as long as u < 1 but only from —1 to Ju when и > 1. 
Therefore, 


2 [va 
С(и) -$ | (x + 1) dx 
vu 


=4/u, 0<и<1 
and 


2 [уч 
ow =5 | (x + 1) dx 
“1 


=3и+2/и +1), 1<и<4 
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The corresponding density functions are 


2u^ 1/2 


0<и<1 


(4.3.3) Ш> 


1<и<4 


— 1/2 
00) = ge). 


(2л)! /2е7*%°/2 (so that X is a standard normal 


EXAMPLE 3 If f(x) = ф(х) = 
from 0 to oo and its domain is from 


variate), and if u(x) = 4x’, the range of u is 
=% to co. Therefore, 


У?и == 
G(u) = | (x) dx = 20(/2u) – 1 


апа Бс 
(4.3.4) дб) = dG(u) - (m) 127 е7" 


du 


der the sign of integration). 


(see Appendix A.9 on differentiation un | 
he gamma distribution discussed in the 


This distribution is a special case of t 
next section, 


4.4 The Gamma Distribution The distribution with density function 


а-1 
(4.4.1) о) = e n» 0<x<@ 
0.6 
0.5 
| 0.4 
F(x) 03 


4 
же 


0 1 2 3 
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where > 0, is called the gamma distribution (see Appendix A.5). The gamma 
function, Г(о), is a kind of generalized factorial with the basic property Г(и + 1) 
= al(a) Since I (3) = п"? (see Appendix A.7), the gamma distribution with 
parameter « = $ is the one obtained in example 3 above, given by Eq. (4.3.4). 
The form of the distribution, for a few values of g, is shown in Figure 23. 

The m.g.f. is 


(4.4.2) M(h) -| e'*f(x) dx 
о 
On making the substitution и = (1 — Й)х, and supposing that 0 < А < 1, 
we obtain 
1 © 
4.4.3 M(h) =]. eu? (1 — h)7* du 
(4.4.3) ( T@ Jo 
=(1—h)7? 


by the definition of Г(а) in Eq. (A.5.1). 
The c.g.f. is therefore 


(4.4.4) K(h) = —«log(1 — h) 


№ 
=“ +—+— +... 
«(к+ + ) 


so that the first few cumulants are 


(4.4.5) Ki =a, K,-—U, кз = 20, 


Ка = ба... 
а 5 к. = n 
and in general к, = «Г(г). The skewness is > = 2071/2 and the kurtosis is 
K2 
к, 
— = бас! 
K2 


The gamma distribution has therefore a single parameter which is at the 


same time the mean and the variance. As the parameter increases, the distri- 
bution becomes more nearly symmetrical. 


A somewhat more general two-parameter distribution, with density function 
а—1 
(4.4.6) f(x) = еч S_ 
B*T (a) 


а gamma distribution. The r'^ cumulant is 
а) corresponds to the special case В = 1, 
ction for the gamma variate of Eq. (6) is 


(4.4.7) F(x) = [ло аи 
о 


where а > 0, В > 0, is also called 
к, = aff'T(r). Equation 
The distribution fun 


1 х/В 
= ral е °-1 ао 
0 
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Where we have made the substitution и = Ве. The function of y and а, defined by 


y 
(4.4.8) Ty) -| ety do 
о 


is called the incomplete gamma function, and has been extensively tabulated [1]. 
Pearson's tables actually give the ratio of the incomplete to the complete gamma 
function, namely, 
Bro 

T(o) 


(4.4.9) I(u, X — 1) 


with и = уа. 

Thus, to find the value of F(x) in Eq. (7) for any given x, we should look up 
in the tables the value of ДХВ 1071, а — 1). 

It may be noted that the Poisson cumulative probability P(c, и) (see $ 3.8) is 
expressible in terms of the incomplete gamma function. In fact (see Problem 
B-11), 

(4.4.10) P(c, и) = T = Қи, c — 1) 
with и = uc ^? | 

It was shown in $ 2.12 that in order to find the cumulant generating function 
of a sum of independent variates we simply have to sum the individual c.g.f.’s. 
The sum of n independent gamma variates of the type described by Eq. (1), with 
parameters 0, %2- ++ On» has therefore, by Eq. (4), the c.g.f. 


K(h) = — Уа log(1— й) 


and this is the c.g.f. of a gamma variate with parameter Уга. Оп theassumption 
that a distribution is completely determined by its c.gf., this shows that the 
sum of n independent gamma variates isa gamma variate. The assumption in 
question is justified for the distributions likely to occur in statistical theory. 


4.5 The Beta Distributions The two-parameter distribution with density 
function = 
(4.5 ры = 0=х<1 
.5.1) = В(о, В) 


where а > 0, В > 0, and BO, В) is the beta function (see Appendix A.6), is 
called the beta distribution. The somewhat similar type of distribution with 


density function ЕЯ 
ex Е 


(4.5.2) g(x) = x* BaD” 0<x< œ 


where « > 0, В > 0, may also be called a beta distribution, and will be referred 
to as the beta-prime distribution. 
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2.0 

f(x) 
| 15 
f (x), 1.0 


& (x) 
0.5 


0 


0.5 1.0 1.5 2.0 


х— 


Fic. 24 BETA AND BETA-PRIME DISTRIBUTIONS 


The principal property of the beta function, 


r(A) 
Г(а + В) 
is proved in the Appendix. The general shape of the distributions given by 
Eqs. (1) and (2) is illustrated in Figure 24, for « = 4, В = 3. The curve of f (x) 
is tangential to the x-axis at x = 0 and x = 1 if « and В are both greater than 2. 
The curve of g(x) is tangential at x = 0 if « > 2. 

The ri" moment about zero of the beta distribution is 


1 
(4.5.4) Га -| xf (x) ах 


0 


(4.5.3) B(a, B) = 


_ Ba + г, В) 
— Во, В) 
Ta@+r) T(« + f) 
Г(о) Го +В +) 
E (x--r—1(x4r—2)...« 
(GB. r-laBar—2)...(x4 B) 
and similarly for the beta-prime distribution 


(4.5.5) ps Е ах 
0 


_ Ва tr, B-r) 

— B(xf) 
(а®+т—1)(®+т—2)...« 
(8 — 1X8 —2)...(8— r) 
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iff > г. The means of these distributions are therefore at о/а + B) and a/(B — 1) 
respectively. The cumulants, as far as they exist, may be calculated from Eq. (4) 
and (5) and the relations given in $$ 2.9 and 2.12. 

The distribution function for the beta variate is 


(4.5.6) F(x) = [В(а, В)] ' | P = u^! du 
о 


for 0 < x < 1. This integral is called the incomplete beta function, B,(a, В), 
and has been tabulated by Karl Pearson and his associates [2]. The tables give 
the ratio of the incomplete to the complete beta function, namely, 


В.(о, 
(4.5.7) T,(a, В) = A ^ 


and so the value of F(x) in Eq. (6) can be read directly from these tables. 
For the beta-prime distribution, 


(4.5.8) G(x) = [Bœ B) | и 1 du) 77? du 
о 
On putting 1 + x = у andl + и = v^ !, this becomes 


1 
(4.5.9) G(x) = [B(@, A]: | #71 — v^! dv 
E 


= [В(а, D]! |, а ит dw 
0 
= 1,-,(a, В) 


4.6 The Chi-Square Distribution Let X;, А»... X, be n independent normal 
.c,?. Let the standardized 


variates with means JI . - - Hn and variances сі? ba 
variate corresponding to X; be 

Xı— ш 
(4.6.1) Z= WC 
4.3, the variable 372 has the gamma 
ave also seen in $ 4.4 that У"; (2212) 
in this case n/2. If then we denote 


Then, as shown in Example 3 of $ 
distribution with parameter а = $. We h 
is a gamma variate with parameter У Qis 

2 


> Z? by x°, the variate a has the density function 


(4.6.2) (©) _ er = 


The density function for x itself is g(x’), where 


(4.63) oie ace) =г (©) a(E) 
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so that 
2\ ("/2)-1 6-722 

| eB. 
(4.6.4) g 2 rap 

A distribution with this density function is called the chi-square (y?) distri- 
bution. The number n is called the number of degrees of freedom. It is the 
number of independent normal variates whose squares are added to produce 822 

The chi-square distribution is an important one in statistical theory, being 
much used for testing the goodness of fit of a theoretical curve to an empirical 
distribution and for testing certain types of statistical hypotheses. Examples of 
these uses will be given later. Meanwhile we list a few properties of this distri- 
bution. 

The shape of the curve of g(y?), plotted against X^, depends on the value of 
п. The curves for different п look like the gamma distributions of Figure 23. 
Since the r^ cumulant for 72/2 is к, = (n[2)T (r), the r cumulant for x’ is 
2'к, = 2" Це — 1)! n. 

The expectation, variance and skewness are therefore given by 


кү = Е(у?у=п 
= Vy?) = 
(4.6.5) і can 
_% AUS 
yi PEE = x) 


The distribution function for x? is 


(4.6.6) Gu) = | '002) dp 
о 


In the statistical applications, we are usually interested in the area of the chi- 
Square curve to the right of a particular value и (the shaded area in Figure 25). 


Fic. 25 CHI-SQUARE DISTRIBUTION 


This is equal to 1 — G(u). The table in 
ing to selected values of 1 — Gi 
may be found in [3]. 


Appendix B gives values of u correspond- 
(и) for all и from 1 to 30. More extensive tables 
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The function G(u) is converted by the transformation y? = 2v into an in- 
complete gamma function: 


n dv 
э (n/2)-1,—-v 
(4.6.7) Glu) = ү un en Oo 
— Гы2 (1/2) 
= T(n/2) 


= I{u(2n)~ ^, (n/2) — 1) 
by Eq. (4.4.9), so that the tables [1] may also be used to find С(и). 
For large n, it was shown by Fisher that (252)? — (2n — 1)! is approxi- 
mately a standard normal variate, so that if we need values beyond the scope 
of the table in the Appendix we can put 


(4.6.8) (Q3?)!? x z + (2n — 1)? 
where z is given by Ф(2) = G(x’). 

Thus, suppose in Figure 25 that и = 30 and the shaded area is 0.05. The 
value of u given by the table of y^ is 43.773. The corresponding 2, for Ф(2) = 0.95, 


is 1.645, so that 
Q32)!? = 1.645 + (59)!/? = 9.326 


which gives y? = 43.49. р 
A still better approximation is that of Wilson and Hilferty [4], namely, 


y 1/3 2 212 
f.m +=) 


2\ 1/3 
For the case n = 30, 2 = 1.645, this gives (5) = 1 — 1/135 + 1.645/(135)!/? 


= 1.1341, from which y? = 43.76. This is very close to the true value. 


* 47 Theorems on the Chi-Square Distribution The following theorems are 
sometimes useful in establishing the distributions of particular statistics. The 
proofs are either sketched briefly or are omitted altogether. 


TuroreM 4.1 If Y, (i= 1, 2...n) is one of a set of orthogonal linear 
functions (see Appendix A.10) of the independent variates X((j = 1, 2... n), and 
if the X; are normal with mean 0 and variance 1, then the distribution of У, УР is 
chi-square with n degrees of freedom. 

We first note that the distribution of any one of the Y; is normal. This 
follows from Eq. (2.11.8) and the assumption that the distribution is determined 
by its cumulant generating function. The further assumption that the different 

У, are orthogonal to each other implies that they are independent and that 


(4.7.1) Y УР = Y x? 


and, since У; X is a chi-square variate, by $ 4.6, so is У, УР. 
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Examples of orthogonal linear transformations are 


Y, = Q) "(X  * X) 


Wis Y, = 0) ^90, — 3.) 
and 
Y, 22 (X, +Х, - X49 X4) 
У, 227 "X, — X3) 
(4.7.3) 


Y, «67 1PUC. + X 2X.) 
Yy S12 ЕХ 3X) 


For each Y; the'sum of the squares of the coefficients of the X jis 1, and for any 
two different Y's the sum of the products of the coefficients, pair by pair, is 0. 
This is the distinguishing characteristic of an orthogonal linear transformation. 

The fact that the sum of the squares of the coefficients of Y; is 1 shows that 
the variance of Y; is 1 (the same as the variance of the X 7). Also the expectation 
of У, is obviously 0, so that the Y; are standard normal variates. Each Y. (2 is 


therefore a chi-square variate with one degree of freedom. Note that T 
= (У Xn = nX?. 


THEOREM 4.2 The sum of two independent chi-square variates with n, and n; 
degrees of freedom is a chi-square variate with n, + n; degrees of freedom. 

This follows from the corresponding property for gamma variates. 

THEOREM 4.3 (Fisher's Theorem) / 4 = У", X? and B= УК: Hj 
where the У; are orthogonal linear functions of the independent standard normal 
variates X;, then A — B isa chi-square variate with n — h degrees of freedom, and 
is independent of B. 

Note that by Theorem 4.2 and the distribution of УР, the quantity B is a 
chi-square variate with / degrees of freedom. Since A = 21-1 Y), the difference 
А — B is a sum of n — Л of the Y? and is therefore а chi-square variate with 


n — h degrees of freedom. Fisher’s theorem states that 4 — B and B are dis- 
tributed independently [5]. 


THEOREM 4.4 (Cochran's Theorem) If A = У. X and if A= 41 + 92 
+... + qu where the q's are quadratic forms in the X j With ni, n3 . . . ny degrees 
of freedom respectively, then a necessary and sufficient condition that the q's are 
independent chi-square variates with Ny, 715, . . . n degrees of freedom is 


ni +na+... Фп = п 


А quadratic form is an expression of the type q = 
are real numbers. To say that the form has r degrees 
means that the largest non-zero determinant whic! 
matrix a;; Ваз r rows and columns. See Appendix A. 


У aXX j, where the dij 
of freedom (or is of rank r) 
h can be formed from the 
-20 and reference [6]. 
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EXAMPLE 4 If X = (1/n) У X, then as shown above, nX? is a chi-square 
variate with one d.f. We may write 


(4.7.4) A-Y ХР = У(Х, – Х) +X? 


It follows that Y; (X, — Х)?, which is n — 1 times the sample variance, is 
distributed as y? with n — 1 degrees of freedom, independently of nX’. 


* 4.8 The Log-Normal Distribution It sometimes happens that if a variate Х 
(which takes only positive values) is markedly skew in its distribution, log X is 
much more nearly normal. This may be tested readily by plotting the cumulative 
percentage frequency for a good-sized sample against the corresponding X on 
special logarithmic probability graph paper. This paper has a logarithmic scale 
along one axis and a probability scale (like that in Figure 20) along the other. If 
the resulting points lie nearly on a straight line, the distribution of Х in the popu- 
lation may be taken as log-normal. 

Some examples of distributions which have been found to be nearly log- 
normal are the sizes of silver particles in a photographic emulsion, the survival 
times of bacteria in given strengths of disinfectant, the effective lengths of life of 
some types of industrial equipment, the blood pressures of human beings, the 
magnitudes of maximum annual floods for a given river, and even the numbers of 
words in a sentence written by George Bernard Shaw. 

Let У = log, X, and suppose that the distribution of Y is normal with mean 
x and variance В. Let f(x) and 9( у) be the density functions for X and Y re- 


spectively. Then 


1 
(4.8.1) fo») dx =g(y) dy = 900) x dx 
so that 
(їз f(x) = x^g) = x !Qnf) e -ө-)/28 
The ғ" moment about 0 of the variate Xis 
PERS. И, -| ху) dx 
0 
-Í e”g(y) dy 


since x = е? if = log, x. Carrying out the integration, we obtain 


, a) 
(4.8.4) ш, = ехр(та +> 
The mean of X is therefore 


(4.8.5) w, = и = ехр(о + 38) 
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and the variance is 
(4.8.6) и — (u's)? = о? = ехр(2а + 28) — ехр(2а + В) 

= wn? i 
where n? = е? — 1. The quantity y is the ratio о/и, which is called the coefficient 
of variation. The skewness of the distribution is given by 


(4.8.7) у =n? +3 


If -Y = logio X = clog, X, where с = 0.4343 approximately, and if « and 
В now refer to logy, X, the Eqs. (5), (6) and (7) will need to be modified by 
writing a/c for а and В/с2 for В. 

Various modifications of the simple log-normal distribution have been 
suggested. A full discussion may be found in [7], and a table of critical values of 
the distribution in [8]. The logarithmic transformation is often used to stabilize 
variance in situations where the observational data fall into groups with different 
means and where in each group the standard deviation is roughly proportional 


to the mean. The transformed variates will in this case have approximately 
constant variance. 


4.9 Families of Theoretical Distributions The process of curve-fitting was 
at one time very popular among statisticians—much more so than it is today— 
and whole families of theoretical distributions were invented to fit (as it was 
hoped) almost any kind of empirical distribution that might turn up. One such 
family (including eight principal types of curve and a variety of special cases) 
was devised by Karl Pearson. Another idea, due to the Norwegian statisticians 
Gram and Charlier, was to use the normal distribution, modified by adding terms 
proportional to the Ist, 2nd, 3rd . . . derivatives of the normal density function. 
The coefficients of these terms turn out to be either zero or else simply expressible 
in terms of the cumulants of the distribution. A brief discussion of the Pearson 
family of curves and of the Gram-Charlier series may be found in [9]. 


4.10 The Central Limit Theorem We close this chapter with a short des- 
cription of a famous theorem which plays a central role in the theory of statistical 
inference, and accounts very largely for the importance of the normal distri- 
bution in theoretical investigations. 

Let Ху, X... X, be independent random variates all having the same dis- 
tribution with mean и and variance а?, but not necessarily normal. Let the 
standardized variate corresponding to X. ; be 


(4.10.1) 7708 X;-u 
с 
апа let Y, be defined by 
(4.10.2) y, =Z _ ying 
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where Z is the arithmetic mean of the Z;. Then the theorem (in its simplest and 
least general form) states that as  — co, Y, tends to a standard normal variate, 
or, in symbols, 

(4.10.3) P(Y, < y) 2 Ф(у) 


The point of the theorem is that, no matter what the original distribution of 
Z, may be (provided of course that Х; possesses a mean and variance), the mean 
of a large enough sample will have a nearly normal distribution. 

The cumulant generating function for Z; will be 


2 
(4.10.4) K,(h) => + O(h?)* 


since the coefficients of Л and of /?/2! are 0 and 1 respectively. Since Y, is a 
linear function of the Z;, with coefficients all equal to n^ ^^. the c.g.f. for Y, will 
be 


(4.10.5) K,(h) = у, К (апт?) 
1 


һ? h? 


2 


= + terms of order п^!/® 


As n > оо, K(h) — 12/2, which is the c.g.f. for a standard normal variate. This 
suggests the result, which is indeed true, that 


x= 
(4.10.6) limp(n"? э » = Oy) 
n> с 
It is not necessary that the Ху should all have the same distribution. If 
E(X) = и, and ИХ) = вр, and if M, = Y; n; and 5,5 = У, ој?, then (as 
proved by Lindeberg) 


X;—M, 

(4.10.7) lim (2 s < » = Ф(у) 
provided that the following condition holds for every ё > 0: 
1 n 

(4.10.8) lim 52 $ fe — ш) 0х) dx = 0 


where f(x) is the density function for X; and where the integral is taken over all 
values of x such that |x — nj > S This condition implies that S, — oo but 
o,/S, > 0, as п > оо, for every value of j. In other words, the total sum of 
variances tends to infinity but the proportional contribution of each individual 


*The notation О(/3) means terms of the order of h3. This includes all terms proportional 
to АЗ or to any higher power of h. 
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variate to this sum tends to zero. If the variate X ; is discrete instead of continuous 
the integral in Eq. (8) must be replaced by a sum. 

Lyapunov proved that if the absolute third moment exists for each of the 
Х so that for any па finite А, exists, given by 


(4.10.9) R= $| |x = ш/х) dx 
T J-o 
then the condition for the central limit theorem, Eq. (7), to hold is the simple one 
R 
(4.10.10) msi 
n> n 


It is not even necessary that the X; should be independent. И is sufficient that 
X, and X; should be independent for |i — j| > m, where m is some fixed number. 
This means that if the variates are arranged in some natural order, consecutive or 
nearly consecutive members may be dependent, provided that all widely separated 
ones are independent. 

Sums of random variables whose distributions do not have a finite second 
moment may not show any tendency to approach normality. If the X; have a 
Cauchy distribution, given by 


(4.10.11) f(x) = [п + x)]^! 


then the distribution of X(— n^! Y; Xj) is the same Cauchy distribution, no 


matter how large n may be. 
For a fuller discussion of the Central Limit Theorem see [10] and [11]. 


PROBLEMS 


А. (88 4.1—4.3) 
1. Show that the m.g.f. for the uniform distribution on the interval (0, 1) has the 


form M(h) = 1 + h/2! + 12/31 + .... Write down the expression for the c.g.f., 
expand it as far as the term in Л“, and so obtain the first four cumulants. 

2. What transformation will change the variate X to one having a uniform dis- 
tribution on (0, 1), if the density function for X is f (x) = (x — D/2, 1 E x < 3, and 
f(x) = 0 for x < 1 and x > 3? 

3. If f (x) = 2xe-*', x > 0, find the density function for U, where U = Х?. 

4. If X has the density function f(x), x > 0, what is the density function for 


EV 
=| , when U <u. To 


U = aX? + b, where a > 0? ни: ао x«( 


1 Gli), use Appendix А.Э. 
- „ЗЕ X has the density function f(x) = Zx, 0 © x < 1, find the distribution , 
м = yt 1j 
U — GX — 1. Hint: For U <u, 0 и — l, X goos from і T. " Leg ! 


1/2 
for 1 £u <4, X goes from 0 to ГЕИ я 
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6. If X h 
fungtio as the continuous distributi i 
ris £ t stribution function FI istributi 
= of (a) eX (b) sin X (c) F(X)? Hint: P(e* < = Peak Бунт 
анода Si < log x) = F(log x). 


1. Show that (") Pa + 1) _ 1 
d Tra ОГ r+) Baar LA 


has a single mode at x = Bla — 1) 


‚„ 2+ Sho 
iasi ecu the gamma distribution, Eq. (4.4.6), 
3. н реа the x-axis at the origin if « > 2. 
(a) [= * s gamma functions the following integrals: 
1x2/3 dy = _ 
m dx (b) |2 e0 + xI2 dx. Hint: In (b) put + x/2 = »/6. 
* Find the const Р Г < 
tKifK м Р 
use Eq, (4.6.3) ant КИК | 2 + 29777 dz = 1. Hint: Put z = tan б and 
$ Show that 


xa 2/2 Р 
cos? 0 40 = | sin? 0 40 = ш = 1 B 
6. Us 9 о = *2, 
€ Eq. (A.6.4) to show that 
1 ym-1 + хт! 
В = Ща 
Bins (т, п) |. ü yr" x 
Wie Eq. Ба 
n = sid pan Чие the domain of integration into two parts, O to 1 and 1 to oo. 
, Prov put y = I/x. 
leg is var BAB, а) = Bla B) — Bie B) an 
ва 8, Show epis In Eq. (4.5.6), put и = 1—2. 
КУШ * = the expectation of the positive square root of a beta variate with 
the positive Bis Гы > б э Ex p Mi and that the expectation 
р 9. The ha Square root of à one-parameter gamma vatlate is Pa + PAT). 
= tation of ux T mean of a variate X may defined as the reciprocal of the ex- 
л> 0, sho X. If the probability density for Xisf/ (X) = хп + 0,0 € x « 
Fing the | ES that the harmonic mean of X is equal to Л. 
harmonic mean of the distribution with density 


d that therefore 16, о) = 


ml п> 2. 


@ ж = Ai 
and M itself а gamm 
= x is given by 
1a + 0] 


-a-i, 


f = y” 
(x) = х n-i(1 4 х)-"-"/ Ви, iij; 
mean M, a variate with 


10. If у; 
X is a Poisson variate with 
bability that x 


Para, 
meters æ and B, show that the pro 
Р(Х = х) = Г + apr 
" cx BC + В) 
sion of (1 + В) — Bl; 


Since 
ie this j | Wet 
he фы for each x(0, 1,2...) a term im te үзү! T1 Hint: Integrate the joint 


dig Sttibution i 
рш ол of vs y] “negative binomia 
"ELS + and M øver all values ога. " be expressed in 
has the sumulati robability P(c« м) mas ^ 
the cumulative Poisson P y Pe е, p) = THO = 


Ay, OF the | ; de 
ме 5, 6 incomplete notion by the ей 
tanger ce ie amma ось ‘неона may Ue WP en 
ht "ET 
fla + 4) = f(a) + hf a) + +++ т =” e 


_ | "еа + И — дё 


i (c — 1)! Jo 
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Put f(x) = e7, a = 0, h = p, and divide through by e», Note that 


c-i ре 
x е-и zi 1 — P(e, p). 


C. ($$ 4.6-4.9) 

1. Show that the c.g.f. of a standardized one-parameter gamma variate tends to the 
value h?/2 as the parameter tends to infinity. Hence show that the c.g.f. of the standard- 
ized chi-square distribution tends to this value as n — co. (See Problem D.13 of 
Chapter 3.) 

2. Prove that if 1 = (x? — n)/(2n)*/, the density function for г is given by Го = 
K(t + c)*-1e-*, —c < t < co, where с? = n[2 and К = (c)"e-*|T(c?). (This dis- 
tribution is known as Pearson's Type III. See $ 4.9. The skewness is 2/с = (8/n)!/?). 

3. Show that the probability that y? > c may be written 1 — I(cl(2n)!/?, (n — 2)/2), 
where J is Pearson's incomplete gamma function. 

4. If y dx is the probability that X lies between x and x + dx and if y is given by 
the solution of the differential equation dy/dx — y(a — x)/(bx - c), show that (for 
suitable values of the constants a, b, c) a certain linear function of X has the x? dis- 
tribution with n degrees of freedom, where n = 2(1 + a/b + c[b?). Hint: The arbitrary 
constant in the solution of the differential equation is determined by the condition 
Jy dx = 1, from —c/b to co. Put V = 2(bX + c)/b? and show that V is а x?-variate. 

5. If Y — log. X, and Y is a standard normal variate, write down the density func- 
tion for X, and calculate the expectation and variance of Y by integration. 

6. If loge X is normally distributed with mean 1 and variance 4 calculate the 
probability that X lies between 1/2 and 2. 
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Chapter 5 
SAMPLE AND POPULATION 


5.1 Inferences from Sample to Population As we have already seen, the 
data in most statistical problems relate to a sample drawn from some parent 
population, or universe (as it is sometimes called). Various characteristics of 
the sample, such as the mean, median, standard deviation or skewness, may be 
calculated from the data, and they serve to give a concise description of the 
sample itself. Their more important use, however, is to enable us to make 
statements about the population. Such statements, of course, being of the 
nature of inductive inferences, cannot be made with complete certainty, but 
only with more or less probability of being true. Nevertheless it is worth while 
to be able to state, for example, that the mean of a particular population may be 
taken as lying between 21.7 and 25.8, with a probability of 0.90 that this state- 
ment is true. We shall see in the present chapter how some estimates of this sort 
are arrived at. 

The population characteristics in which we are interested are usually para- 
meters which occur in the distribution of some variate. If, for example, the 
population is assumed to be normal, as far as a particular variate is concerned, 
the density function for this variate will contain two parameters, и and о, which 
are the population mean and standard deviation respectively. These may be 
estimated from the characteristics of a sample, such as the median and the 
range, for instance, or the sample mean and the sample standard deviation. 

When a sample is used to make inferences about the population, we generally 
assume that the sample is random. This usually means (when the population is 
finite) that every individual in the population has an equal chance of being 
included in the sample. More generally, if X is the random variable which is 
under consideration and which has a distribution function F(x) in the population, 
and if Y,, Х,,..., Xy are measured values of X on sample items from the 
population, the sampling is random if all the Xi = 1, 2... №) are independent 
random variables (see § 1.13), each with the same distribution function as X 
itself. The probability that the observed sample has values equal to or less than 
ху, X2, .. . , Хм for the respective items is then F(x1): F(x;) . . . F(xy). 

It is usually desirable that sampling should be as nearly random as possible, 
although this is often hard to achieve in practice. Even if the sampling is not 
purely random, it is still possible to make valid inferences, provided that the 
respective probabilities of being included in the sample are known for all 
members of the population. In a scheme described as stratified sampling, for 
instance, the whole population is divided into classes (or strata), each of which is 
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sampled separately. The sizes of the various strata must, however, be known, and 
within each stratum the sampling must be random. For valid statistical inferences 
there must always be present somewhere this quality of randomness. A fuller 
discussion of some common sampling procedures will be found in Chapter 7. 


5.2 Point Estimation and Interval Estimation Sampling theory deals with 
questions like the following: given a random sample of N variates from a 
certain population, what can we say about the parameters that define the 
distribution of such variates within the population? There are two distinct 
questions that we may ask about any one parameter, namely, what is the best 
value to use for it and how reliable is this best value? The first question is one 
of point estimation—we want a single value which in some sense is the “best” 
estimate we can make of the parameter (various criteria are possible for judging 
the goodness of an estimate and they do not always agree in their choice of the 
best). The second question has to do with the interval in which we can confi- 
dently expect the true but unknown value of the parameter to lie, and is said to 
be a problem of interval estimation. We may, for instance, be able to say on the 
basis of a sample that the best estimate we can make of the population mean is 
159 Ib and that we feel 90% confident that the true value is somewhere between 
150 Ib and 168 Ib. This interval (150 Ib to 168 Ib) is called a confidence interval, 
with confidence coefficient 90%, or 0.90. The confidence interval is a random 
variate, calculated from the sample and having a probability distribution, 
whereas of course the population mean, although unknown, is not a random 
variate at all in the usual sense. We should not therefore speak of the probability 
that the population mean lies in a given interval but rather of the probability that 
the given interval includes the population mean. 

To say that a confidence interval for a parameter has a confidence coefficient of 
0.90 means that the statement “this interval includes the true value" has a proba- 
bility equal to 0.90 of being correct. In other words, if we continue to make 
similar statements on the basis of many other samples from the same population, 
using the same estimation procedure, about 90 % of these statements will be true. 

The concept of confidence intervals is one of the main contributions to 
statistical theory by J. Neyman and E. S. Pearson [1]. A somewhat different 
concept, leading in many cases (although not in all) to identical results, is that of 
fiducial intervals, due to Sir Ronald A. Fisher [2]. In this view it is permissible to 
attach a fiducial probability /(0) to the parameter 0, although this is not to be 
interpreted in the ordinary (frequency) sense of probability. The idea is that 


INEO 40 is a measure of our belief that 0 lies between 0, and 0, (the Latin 
1 


word "fiducia" means trust). The fiducial probability, like the confidence inter- 
val, is calculated from the known sampling distribution of the statistic used to 
estimate 0 (see § 5.4). 


5.3 Confidence Belts For simplicity we consider a population defined by a 
single parameter 0 and we suppose that a statistic T, derived from a sample of 
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size N, is used to estimate 0. The statistic T is often called an estimator of 0. 

The distribution of T, for given 0, is supposedly known. That is, for any 

admissible value of 0 (say in the range from a to fj) we can calculate the proba- 

bility that Т will lie between two в 

given values, г; and £5. B 
In the diagram, Figure 26, г is 

plotted as abscissa and 0 as ordinate. 

The possible values of T for a sample 

drawn from a population with a given 

value of 0, say 0’, lie along the line AB. 

Onthis line wecan mark two points, at 

t, and f,, such that the probability that 

T < t, isa fixed value e, say 0.05, and a 


the probability that T > t, is also в. 0 
If F(t|0) is the distribution function 
for T, with given 0, these probability Fig. 28: CONRAN Ree 
statements may be written 
(5.3.1) Fa,|0) =e F(t2|0) =1 -e 


where it is assumed that 0 < £ < 3. 


If we now imagine that there are a great many hypothetical populations 
with values of 0 between « and В, and that for each one the appropriate values 
of t, and t; are calculated, the points so obtained will lie on curves something 
like those marked C, and C,., in the diagram. Since / is supposed to be an 
estimate of 0, it is reasonable to assume that both curves represent one-valued 
monotone-increasing functions. (If t is any sort of a reasonable estimate, it 
should increase as 0 increases.) 

The region bounded by the two curves and by the lines 0 = «апа 0 = [ is 
called a confidence belt, with confidence coeflicient 1 — 2e. This belt can 
theoretically be constructed from a knowledge of the function F(t|@) alone. 

Now suppose that for one particular random sample (of size №) we obtain a 
value г, of T. The value of 0 for the population is unknown, except for the fact 
that it must lie between g and fl. If at го we draw an ordinate cutting the curves 
C, and C,_, at 0 = 0, and 0 = 0, respectively, then all points on this ordinate 
between 0, and 0, lie inside the confidence belt. We see that 0, is the lower 
bound of values of 0 such that F(fp, 0) < 1 — £, and 0; is the upper bound of 
values of 0 such that F(tg, 0) > =. We can therefore assert, on the basis of our 
sample value fo, that 0 lies between 0, and 0,, and the probability that this 
claim is true is | — 2c. The values 0, and 0, are the lower and upper confidence 
limits for 0, corresponding to the observed fọ, and 1 — 2e is the confidence 
coefficient. The smaller the value of ¢ the more confidence we shall feel in the 
rightness of our claim, but of course the smaller we make e the wider will our 
belt become, and therefore the greater will be the interval 0; — 0,. We can 
increase our confidence in a statement only by making the statement vaguer. 
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In the above illustration it was assumed, for convenience, that the variable Г 
concerned was continuous. If the variable is discrete, the curves C, and C,.., 
will be stepped, as in Figure 27, which relates to the Poisson distribution of X. 
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For any given и, there will in general be a value of x, such that P(x;, и) < € 
and P(x, — 1, и) > в, where Р(х», и) is the cumulative Poisson function 


© их 
Р(х, и) = У е" — 
x=x2 x! 


It will happen for some values of и that there is an x, such that P(x, и) = € 
exactly. As increases through such a value, x, jumps by a unit. The horizontal 
portions of the stepped curve represent these values of и. Similar considerations 
apply to the curve of x,, which is such that P(x,, и) > 1 — e and P(x, + 1, и) 
Ic 
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The diagram (for ¢ = 0.025) shows that if a single sample value of 21 is 


observed for a Poisson variate X, the population mean и may be taken, with 95% 
confidence, to lie between 13 and 32. 


* 5.4 Fiducial Inference If F(1|0) is the distribution function of T, and if 1, is 
a value such that 


(5.4.1.) P(T < 1) = Ев 0) = k 


then in a fraction 1 — А of all samples drawn from a population with parameter 9 
the statistic T will exceed the critical value żę. This value 7, is a function of б, 
say K(0), and 0 is the inverse function of tą, say Кт). Equation (1) may 
therefore be written in either form— 


(5.4.2) РТ < K(0) =k 
or 
(5.4.3) P(02 К-Ҷі)) =k 


provided K(0) is a strictly monotone-increasing function of 0. 

The form of Eq. (3) is the one preferred by Fisher and expresses what he calls 
a fiducial probability for 0. This does not depend on any assumption about the 
distribution of 0 prior to the examination of any samples. 

If we suppose that the statistic T has a continuous distribution, then, as we 


have seen in $ 4.2, the transformed variate 
(5.4.4) Y = F(T|0) 


has a uniform (rectangular) distribution on the interval O to 1. This means 
that for any fixed number k, between 0 and 1, 


Fic. 28 FIDUCIAL INFERENCE 
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Let us assume that the possible values of T form an interval (a, b) and that 
the possible values of 0 form an interval (а, р). | 

For any given 0 we can plot F(t|0) against 7, as in Figure 28; F(t|0) is the 
probability that T < ¢. In most cases we shall find that for a fixed t, say tẹ, the 
values of F decrease as 0 increases. The nearer 0 is to а, the nearer F(t,|0) will 
be to 1, and the nearer 0 is to В the nearer F(t,|) to 0. If so, the equation 
F(t,|0) = К determines uniquely a value 0, such that 0 > 0, when F(t,|0) < k. 

Equation (5) may therefore be written 


(5.4.6) P(0 > 0,) =k = F(t,|0,) 
or 
(5.4.7) Р(0 < б) =1—К =1 — F(t0) 


The quantity 1 — F(t,]0,) is the fiducial distribution function for 0. Actually 0, 
in Eq. (6) is a random variable, determined by the relation F(t,|0,) = К for a 
given k and for the observed value г, of the random variable T. The probability 
statement really concerns this random variable 0,. By twisting the inequality 
from the form in Eq. (2) to the form in Eq. (3) we can make a probability state- 


ment apparently about 0, but this does not convert 0 into a random variable 
(see further in [3]). 


5.5 Confidence and Significance The determination of confidence intervals 
is closely related to the estimation of significance. A problem that sometimes 
arises in statistics is that of judging whether a population parameter differs 
appreciably from some value which has been fixed beforehand, perhaps from 
some purely theoretical considerations. Suppose the theoretical value is 0, and 
the point estimate from a sample is Ü. We need to assess the significance of the 
difference б — 0,. If this difference is greater (numerically) than a certain 
amount we shall say that the difference is significant, if less, that it is non- 
significant. Obviously, it is impossible to draw a hard-and-fast line between 
significance and non-significance—there will be border-line cases which are 
difficult to classify—but statisticians in general accept the following convention: 
if the probability of obtaining by chance a sample with a difference numerically 
as great as 0 — 6, is less than 0.05, the observed difference is significant; if the 
probability is less than 0.01, the difference is highly significant: if the probability 
is greater than 0:05 the difference is non-significant. In border-line cases the 
statistician will usually prefer to suspend judgment and perhaps try to get a 
larger sample. 

If, having obtained the sample estimate 6, we calculat 
95% confidence interval for 0, stretching say from 0, to 05, there will be a 5% 
probability that this interval, will not include 05. In other words, if 0, lies outside 
the 95% confidence interval the difference  — 0, 


o will be regarded as significant. 
Similarly if 0, lies outside the 99% confidence interval the difference will be 
considered highly significant. 


Sometimes the statistician is faced with the question of significance for the 


€ the corresponding 
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difference of the estimates given by two separate samples. In such a case he will 
choose zero as the hypothetical value for this difference, and test whether the 
observed difference is significantly different from zero. A method of doing this 
is to construct the confidence interval for the observed difference and see 
whether this interval includes the value zero. 

Some more general considerations on the testing of hypotheses will be 
discussed in Chapter 6. 


5.6 Desirable Properties of an Estimator Let us suppose that we are using 
the statistic Т, derived from a random sample of N observations x,, x5 . . . xy, in 
order to estimate the parameter 0 which occurs in the distribution function of the 
population. The estimator T is said to be consistent if, as N increases indefinitely, 
T tends (stochastically) to the value 0. That is, for any given є > 0, 


(5.6.1) Р(|Т — 0| > ғ) 3 0 as N — œ 


This is an obvious common-sense requirement. We should expect a very 
large sample to give us practically the population value of the quantity we are 
trying to estimate. 

A simple test for determining consistency is provided by Chebyshev's 
inequality, $ 2.16. If T is such that ЕТ) > 0 and КТ) > 0 as N — оо, then it 
follows from Eq. (2.16.1) that T is a consistent estimator of 0. 

The estimator T is said to be unbiased if (even for finite №), E(T) = 0, whatever 
other parameters may occur in the distribution function. If E(T) merely tends 
to 0 as М > œ, T is asymptotically unbiased. Yt is generally desirable to use an 
unbiased estimator where possible, but sometimes other considerations are more 
important. 

The reliability of the estimate furnished by an estimator is measured by the 
reciprocal of its sampling variance. The smaller this variance the more reliable 
the estimate will be. The efficiency of the estimator T is given by comparing the 
variance of T with that of the estimator Ty which, of all possible consistent statis- 
tics which might be used to estimate 0, is the one with minimum variance. That 
is, the efficiency of T = V(T)/V(T). A statistic with an efficiency of 1 (usually 
expressed as 100%) is said to be most efficient. 

We shall now consider some estimators which are used to estimate the 
moments, cumulants and other parameters of a population. It will be con- 
venient to start with a finite population. 


5.7 Sampling from a Finite Population Many of the results of sampling 
theory can be obtained by supposing that a random sample of size N is drawn 
from a population of size M. This enables us to use the theory of combinations. 
Results for an infinite population can usually be obtained by letting M > oo. 


If X is the variate measured, the arithmetic mean of X for a sample of size N is 
N 
(5.7.1) m-N'"Yyx 
j=l 
where X; is the j'" item in the sample. 
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The corresponding quantity for the population is 
M 
(5.7.2) = М! Ух, 


where X, is the g™ item in the population. Some of the X, will of course be the 
same as the X;. We shall use X,, however, to mean any item from the population 
and X; to mean one that is also in the sample. 

As an estimator of и, m is clearly consistent, since when № becomes equal to 
M (it cannot get any larger than M), m becomes equal to и. With a finite popu- 
lation, the expectation of a statistic such as m is defined as its average over all 
possible different samples of size N that could be drawn from the population. 


The number of these samples is (5). We shall now show that this average for 


m is equal to и, and therefore m is an unbiased estimator of и. 


TABLE 5.1 
Sample No. | Sample Items Mean (т) 

1 2,5 3.5 
2 2,5 3:5 
3 2,7 4.5 
4 2,10 6.0 
5 2,21 11.5 
6 5,5 5.0 
1. 57 6.0 
8 537. 6.0 
9 5,10 АУ 
10 5,10 7,5 
11 5,21 13.0 
12 5,21 13.0 
13 7,10 8.5 
14 7,21 14.0 
15 10,21 15.5 
125.0 


The number of samples in which any particular XY, occurs is equal to 


M-—1 А в 
( V= ity since the remaining N 
any of the other M — 1 items 
the value of m for each 


— 1 items in the sample can be picked from 


in the Population, This X, contributes ХМ to 
sample in which it occurs, and therefore its contribution 


5 M-— 
to the average m over all samples is aui м Ш 1 у (3) = X,/M. Summing 


p all „, we obtain for the average m the amount У. X,/M, which is и. There- 
ore, 


(5.7.3) Е(т) = и 
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As an illustration involving small numbers, suppose M = 6 and М = 2. 
Then (5 
N > 
The population mean is и = 50/6 = 8.33. The 15 possible samples of size 2 and 
their separate values of m are given in Table 5.1. The average m over all 15 
samples is 125/15 = 8.33, which is the same as и. (In this illustration two of 
the population items have the same value of Х, but they count as different 
items in enumerating the samples.) 

A precisely similar proof may be carried through for the р" moment of X 
about the origin, denoted by m’, for the sample and by и’, for the population. 

A more convenient notation for m'p, suggested by Tukey [4], is the angle 
bracket <p>. With this notation, 


(5.7.4) <p>) =М У ХР 
J 


) = 15. Let the values of X, in the population be 2, 5, 5, 7, 10 and 21. 


(5.7.5) Ep) = Hp = МУХ," 


Let us now consider а pair of items X,, Хр from the population. The sub- 


Scripts distinguish them as different items, but their actual values may happen to 
be the same (like the two 5’s in the illustration above). Each such pair appears in 


N 
ponding population parameter и' by 


(5.7.6) «рау = [NN — D У XPX 
ij 


t — zl different samples. We define the angle bracket (pq? and the corres- 


(5.7.7) Ш“ = [M(M — pnr: Е хм 


where the sum in Eq. (6) is over the N(N — 1) pairs X;, X; in a single sample of 
Size N. The sign У here indicates that the sum is to be taken over all different 
values of the subscripts. 

By considering the contribution of each pair of items X,, Хр to the average of 
«ра», we can readily obtain the result 


(5.7.8) E(<pq>) = pa 


The angle brackets such as (pq? are therefore unbiased estimators of the 
corresponding population parameters. This is true also of brackets with three, 
four, or more symbols. 


* 5.8 Fisher's k-Statistics Unfortunately, the sample moments about the 
mean (of second or higher orders) are not unbiased estimators of the corres- 
ponding population parameters. However we can define a set of statistics, called 
k-statistics, each of which does have this relationship to the corresponding 
population parameter. When the population is infinite, these parameters 
become identical with the cumulants discussed in $ 2.12. 


104 INTRODUCTION TO STATISTICAL INFERENCE 5.8 


In order to calculate the K-statistics for a sample systematically, it is con- 
venient to start with sums of powers of the Ху. Let 


lle =X; = N(» : 
(5.8.1) 5, =) ХР = №) 
[5 -NQ) 


etc. 


Then, as shown in Appendix A.11, 


ММ = 1K11» = 8,2 — S, 
N(N — 112» = 8,5, — S, 
ММ — 1X13) = 5,5. — S, 
ММ — 122» = 5,2 — S, 


(5.8.2) 


etc. 
Also, 
ОЧ -2K111» =5$,<11 = 2¢12> 
(5.8.3) (N — 2)<112> = 51412) — (22) — (13) 
(N — 3)<1111) = S,C111» — 3(112» 
etc. 
The K-statistics may then be defined in terms of these brackets: 
kı — (15 
К = = 
(5.8.4) ‚ш. 
Ку = (3) — 3412) + 2111% 
Ка = <4> — 4413) — 3Q25 + 124112) — 61111» 
Generalized k-statistics [4] may similarly be defined, such as 
ky, — IH 
(5.8.5) Каз = (12) — (111) 
Кз = (22) — 24112) + C111» 


etc. 
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These serve as checks on the calculation of the K-statistics, since the following 
relations hold: 


k 
E NL 
ky =k, N 
k3 
(5.8.6) kz = kik, — 72 
N-1 
Ка = N«I (к - x) 


Each k is an unbiased estimator of the corresponding quantity for the 
population, which will be denoted by x’. The «^s are defined like the k’s, except 
that (p) is replaced by I «рд? by Kpa etc. For an infinitely large population 
they become identical with the cumulants as previously defined. 

The k-statistics are expressible in terms of the sample moments about the 
mean, discussed in $2.8. The relations are: 


k;2——m; 


=i 


N? 
(5.8.7) К = (N-IXN-2 m; 
N? 


ҖЕ (N + 1)m4 — 3(N — 1)т,2 
Ка (N—IXN-ZXXN- 3 4 2] 
It is not, however, necessary to find the moments first. The systematic рго- 


cedure of Eqs. (1) to (4) will give the k-statistics directly. | 
The к’, are similarly expressible in terms of the population moments (with 


M substituted for N). Thus 


K's =M" 


^ M? 
(5.8.8) kas (M — 1M — 2) Из 
2 
K'4 E 3 КМ + Dus = 3(M — 1)и,2] 


(M — (М — 2M — 


When M > о, к) = Hz, Кз = Из» Ka = Ша a in agreement with the 
definitions given in Eq. (2.12.5). 


* 5.9 Computation of the k-Statistics Аз an illustration of the arithmetic 
involved, we will consider the data of Table 2.2 already used to calculate some of 
the moments in Chapter 2. If we use an auxiliary variable u — (x, — 45.5)/4, 
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where x, is a class-mark, then u takes only integral values from —4 to 5 and the 
calculations of sums of powers are greatly shortened. The whole work can be 
carried through in terms of u, and at the end we can convert back to the original 
x values. 

For this sample we first find 


N=1000, S,=Yfu = 553 
5, = У fu? = 2471 
5, = У fu? = 4105 
S4 = У Ји* = 18,407 


Then 
<1) = 0.553 
(2) =2.471 
(3» = 4.105 
(4) = 18.407 
(553)? — 2471 
11) =-= s 
si> 999,000 dnd 
(553)(2471) — 4105 
12) === 
SI 999,000 М 
(553)(4105) — 18,407 
13> =e 
«0 999,000 "S 
(2471)? — 18,407 
22) es 
$22) 999,000 e 
(553)(0.3036) — 2.7274 
ашу o 0.1655 
(553)(1.3637) — 6.0935 — 2.2539 
(112) = T — 0.7473 
(553)(0.1655) — 2.2419 
(111) = — KA OS 
К = 0.553 
К, = 2.471 — 0.3036 = 2.1674 
(59.1) К: = 4.105 — 4.0911 + 0.3310 
= 0.345 
k, = 18.407 — 9.016 — 18.280 + 8.968 — 0.538 


= —0.459 
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ky, = 0.3036 

ky = 1.3637 — 0.1655 = 1.1982 

Кз = 6.0935 — 1.4946 + 0.0896 = 4.6885 


and the checks of Eq. (5.8.6) hold, apart perhaps from a small rounding-off 


error in the last decimal place. 
Finally we can convert the k’s back to the original units (pounds) by writing 


k, = 4(0.553) + 45.5 = 47.71 Ib 


ka = 42(2.1674) = 34.68 Ib? 
(5.9.2) k, = 40.345) = 22.1 Ib? 
а —4*(0.459) = = 118 Ib* 


Using these as estimators of the cumulants for the population from which the 
sample was taken, we have the following estimated values of the population 


parameters: 

к: = 4 = 47.71 lb 

kj = 02 = 34.68 Ib? (о = 5.889 Ib) 
(5.9.3) и 

3 = y, = 0.108 

"E Yı 

Lo: 

K2 


* 5.10 Sheppard’s Corrections The error due to grouping the frequencies 
at the mid-points of the class-intervals, in the computation of the k-statistics, 
may be approximately allowed for by using some corrections first suggested by 
Sheppard. These corrections are applied to the even-order k-statistics only, and 


are given by the relation 


B, 
(5.10.1) (К). =k- € p rz2 


Where (k,)_ is the corrected value of k,, c is the class-interval, and B, is the r™ 
Bernoulli number (see Appendix A.12). For the first two even-order k-statistics 


these corrections are 


c 

(k2). = к. — 12 

(5.10.2) - c 
(ka). > 120 
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In practice the corrections are most easily applied to the К„, as first obtained in 
the и units (for which c = 1). Thus from Eq. (5.9.1) the corrected values are 


(k2). = 2.1674 — 0.0833 = 2.0841 
(k4). = —0.459 + 0.008 = —0.451 


Using these, our estimated x; and к. become 
к. = 33.34 Ib?, K4 = — 115 lb 


and instead of the values given in Eq. (5.9.3) we find the following estimates: 


с = 5.77416 
(5.10.3) у: = 0.114 
72 = – 0.104 


Sheppard's corrections should not Бе used unless the frequency curve appears 
to have a single mode and tails off gradually at both ends. Moreover, unless the 
sample consists of at least several hundred items, the uncertainties due to 
sampling fluctuation are likely to overshadow the corrections. When the cor- 
rections are applicable, however, their use will generally (although not invariably) 
improve the estimates of the population parameters, and they are so easy to 
apply that it is usually worth while to take the slight additional trouble involved. 


‘5.11 Variance and Covariance of the k-Statistics Аз before, when dealing 
with a finite population, we interpret the expectation of a statistic as its average 


taken over all possible different samples of the same size. The variance of k, will 
then be defirfed as 


(5.11.1) V(k,) = ЕК?) — {E(k,)}? 
By the results obtained т $ 5.8 we know that 


(5.11.2) E(k,) = к’, =p 
Also, from the first equation of (5.8.6), 

k 
(5.11.3) ky? ky + 
so that 
(5.11.4) E(k,*) = Е(К, 1) + N^! E(k;) 


=k, + №1к, 
It follows that 


(5.11.5) ИК!) = к’: + №71, (к)? 
The corresponding equation to (3) for the population is 
(5.11.6) WN e xs: K',/M 
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so that Eq. (5) becomes 
(5.11.7) V(k,) 2 &'(N^! — M^!) 


For an infinitely great population, M ^! — 0, and we have the simple result 


r 


a 


(5.11.8) V(k,) = k4N^! = 


zi 


This measures the sampling fluctuation in the value of the arithmetic mean К. 
In practice we generally do not know c? except insofar as we can estimate it 
from the one sample which gives us k,. If we replace c? by the corresponding 
unbiased estimator kz, we have as an estimator of the variance of k, the statistic 


k 
(5.11.9) Pk) TN 


The square root of P(k,) is called the standard error of k,. In terms of the 
sums of powers of X, defined in $ 5.8, 


$, 52-5 

N NWN-1) 

S2 — S|N 
N-1 


(5.11.10) V(k,) = 


The variance of k, is similarly given by 
(5.11.11) V (ka) = Elka?) — Y 


From the last equation of (5.8.6), 


1 1 
к?) = У Baa) + Elko) 


N+1, M 
= Nol K'22 + N Ka 
and also, 
E(k3) = к'2 
Since 


M-i р K'4 
"nM ls i “il 


We find, after a little rearrangement, that 


M-N Р Р -x-x-xx)l 
(5.11.12) vad ar a pe + (1 M N MN 
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which for an infinitely great population reduces to 


2 
(5.11.13) ИК.) = Fai Ka? + у 
If the parent population is normal, к. = 0 and кз = c?. In this case, 

20“ 
NS 


(5.11.14) V(k2) = 1 

In order to find the standard error of k,, the unknown population parameters 

must be replaced by sample estimators. It is easily verified that, for M > оо, 
2 N-1 2 


1 
Al. k,? = 2 4, 
(5.11.15) ED + Naren “= мт"? tw" 


so that we can take the expression in braces on the left-hand side as an unbiased 
estimator of V(k;). Therefore, 


2 N-1 
5.11. P) = —À k? Kk 
nnd шшен + nave 


and the square root of this is the standard error of kz. 
The covariance of k, and kı may be defined as 


(5.11.17) Clky, k2) = Е(К,К,) — E(ky)- E(k;) 
By the second equation of. (5.8.6) 


E(k 
бк.) = E(k) + 209) 
50 that 
(5.11.18) C(k,, kj) = x^, +72 АГА 
СА r ^ K ^ 
= (к - 55) us i 
=K (N! M71) 
For an infinitely great population, 
(5.11.19) C(k,, kz) = кум! 


which is zero for a normal population. 

The first two k-statistics are therefore ипсог: 
population, although this is not їг 
biased estimator of K3, 


related in samples from a normal 
ue for skew populations. Since Кз is an un- 


(5.11.20) С, ka) = ку! 
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5.12 The Distribution of the Sample Mean The arithmetic mean of a 
sample of N observations, X,, X;... Xy, is the first K-statistic К. As men- 
tioned previously, the expectation of k, in samples from a finite population 
is the population mean p (which is к^). That is, 


(5.12.1) Elk) =K, =H 
Also the variance of k,, from Eqs. (5.11.7) and (5.8.8), is given by 
M-N M-N › 
5.12. = к = 
(5.12.2) V(k;) = к мм MDN? 
which for an infinite population becomes 
(5.12.3) V(k,) = в? [№ 


Similar arguments to those used in $ 5.11 (based on relations between angle 
brackets and k-statistics) can be used to obtain the higher moments of the 
distribution of Кү, but the calculations soon become quite complicated. It turns 


out that the skewness is given by 


M-2N[ M-1 fh 
(5.12.4) Sk) = М2 [м = ^ 


and the kurtosis by 


(5.12.5) — Ku(k,) = Км — XM? — 6MN + М +6№ у, 
—6M(MN + M- № – 1)] 


= [NM — 2)(M — 3)(M — №] 


2 
Where y, = K3/x,°/? and у; = Ка/К2 . 
For an infinite population these reduce to 


y 

(5.12.6) Sk(k;) = NB 

(5.12 ки) = 
42.7) u(ky) = 


It is evident from Eqs. (6) and (7) that for large enough samples the skewness 
and kurtosis of the distribution of k, will be nearly zero, whatever the corres- 
Ponding quantities for the population (as long as they are finite). This suggests 
that the mean of a large sample from almost any kind of population will have a 
distribution close to normal, and in fact, if certain conditions are satisfied, this 
Tesult follows from the Central Limit Theorem (see $ 4.10). | | 

If the parent population is normal, the mean of a sample of size № is also 
normally distributed, with variance o?/N, whether N is large or small. (For a 
Proof of this, see § 8.2.) If the parent population is 108 normal but has а finite 
Variance c?, the variance of the sample mean is still c^/N and for large N the 
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distribution is approximately normal. For a parent population not too wildly 
skew, a sample size of 30 or more will usually give a satisfactory approximation 
tc normality. а 

As an illustration with even smaller sample size, a decidedly skew population 
was constructed by writing a number from 0 to 24 on each of 1000 circular 
metal-edged cardboard tags. The frequency diagram for this population is 
shown in Figure 29. (There were 106 tags, for example, marked 4.) The num- 
bered discs were put into a goldfish bowl and well mixed. A sample of 10 discs 
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Fic. 29 FREQUENCY POLYGONS FOR А SKEW POPULATION AND FOR THE 
MEANS OF SAMPLES OF 10 


was drawn and the numbers were noted before the discs were replaced. This 
was done repeatedly, and over a considerable period of time 2500 sample means 
were obtained. These were grouped in classes 3.0 to 3.9, 4.0 to 4.9, etc. and the 
first few k-statistics were calculated. The frequency polygon of the distribution 
of these sample means is shown in Figure 29 along with that for the parent 
population (the two polygons have different vertical Scales, one shown on the 
right of the diagram and one on the left). The much more symmetrical nature 
of the distribution of means is obvious at a glance. Table 5.2 gives for com- 
parison (a) the actual characteristics (population parameters) for the parent 
population of 1000 discs, (b) the theoretical characteristics for the distribution of 
mean in all possible samples of 10, (c) the estimated values for these characteris- 
tics derived from the k-statistics of 2500 actual samples, (d) the approximate 
standard errors for these estimates. For the skewness and kurtosis the standard 


errors relate to a normal parent population (see Chapter 8) and are not very 
reliable. 
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Characteristic Population Population of Means Population of Means 


(Theoretical) (Estimated) 
Mean 7.601. 7.601 7.640 + 0.028 
Variance 19.57 1.939 2.006 + 0.058 
Skewness 0.896 0.279 0.381 = 0.049 
Kurtosis 0.508 0.042 0.095 + 0.098 


It will be seen that in all cases the estimated values agree with the theoretical 
values within about once or twice the standard error. The difference for the 
skewness is slightly more than twice its standard error. 


5.13 Confidence Interval for the Mean (in Large Samples) If we apply the 
procedure of § 5.3 to the statistic m (the sample mean) as used to estimate the 
parameter д (the population mean), we obtain a confidence belt which for fairly 
large samples is of almost uniform width. For a given value of д, the expected 
value of m will be и and its variance will be c?/N. If the sample size is large 
enough for the distribution of the mean to be regarded as normal, or if the parent 
population is known to be normal, the sample mean for given и will, with 
probability 0.95, lie between и — 1.960N~'/? and и + 1.960 №7 1/2, It follows 


Ес. 30 CONFIDENCE BELT FOR THE SAMPLE MEAN 
WITH KNOWN POPULATION VARIANCE 


that for a given m, the 95 % confidence interval for и Нез between m — 1.9607 !/? 
and m + 1.960№-1/2, if ø is known (Figure 30). If o is not known, it may be 
replaced by an estimate such as the sample standard deviation. There is, however, 
а better procedure available when c has to be estimated from a fairly small 
Sample and when the parent population can be taken as normal. This procedure 


will be described in § 8.5. 


ExaMPLE] For a sample of 345 11-year-old boys, the mean weight was 
found to be 74.71 Ib and the standard deviation 10.65 Ib. Calculate 987; 
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confidence limits for the mean weight in the population of 11-year-old boys from 
which this sample was taken. 

Here m = 74.71 16, and s = 10.65 Ib. Using s to estimate c and noting that 
for the standard normal law the 98 % limits are at + 2.326, we find for и the con- 
fidence limits 74.71 + 2.326(10.65)/(345)!/? = 74.71 + 1.33 Ib, or 73.38 to 
76.04 Ib. 


EXAMPLE2 The variable X is the lifetime (in days) of test pieces of metal 
Sheet immersed in tap water, before failure due to corrosion. From a large 
number of trjals the mean value (и) of X was found to be 875, with a standard 
deviation of 85. For further routine testing, how large should the samples be if 
the average life (m) from such a sample is to differ from и by not more than 
5%, with probability 0.907 

Since 5% of p is 43.75, the requirement is that Р(т — u| < 43.75) = 0.90. 
Assuming a normal distribution, the probability 0.90 corresponds to a standar- 
dized variate of 1.645. Therefore, (43.75)/(cN -!/?) = 1.645, with с = 85. This 
gives N — 10. 


——> No. of Successes 


——> No. of Successes 


Fic. 31 CONFIDENCE LIMITS FOR THE PARAMETER 
OF A BINOMIAL DISTRIBUTION 


5.14 Confidence Limits for the Pro 
If X is the number of successes in N t 
being 0, we know from 83.3 that 


bability of Success in a Binomial Population 
rials, the probability of success in each trial 


(5.14.1) E(X) = №0 
апа 


(5.14.2) V(X) = N6(1 — 0) 
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If N is fairly large and 0 not too near 0 or 1, the distribution of X is approxi- 
mately normal, particularly if we make the correction for continuity mentioned 
in $3.11. Thus if 0, 0, are the lower and upper 95% confidence limits for 6, 
we have (see Figure 31) | 


(5.14.3) X — 4 — №, = 1.96 [№0 — 0] 
and 
(5.14.4) NO, — (X +4) = 1.96[0,(1 — ө)? 


These two equations give 0, апа 0, respectively, as the solution of a quadratic 
equation. 

If N is quite large, it will often be sufficient to replace 0, or 0, on the right- 
hand side of Eqs. (3) and (4) by the sample proportion X/N, and to ignore the 
continuity correction. If we do so, the approximate confidence limits are given by 


X 1/2 
(5.14.5) №, = X — 1.96 хп (1 = x) 
and 

X 1/2 
(5.14.6) NO, =X + 1.96 x(t ->) 


EXAMPLE З If in 400 binomial trials we find 280 successes, what are the 95% 


confidence limits for 0? 
(а) The approximate limits given by Eqs. (5) ах) ае 


4000, = 280 — 1.96[280(0.30)] ^ = 262 


0, = 0.655 
апа 
4000, = 280 + 1.96[280(0.30)]*/ 2 = 298 
0, = 0.745 


(b) From Eq. (3), оп squaring both sides, we obtain 


(279.5 — 4000)* = (3.84)(400)8,(1 — 0) 


which, on collecting terms and dividing by the coefficient of 0,7, becomes 


02 — 1.39370, + 0.4836 = 0 


smaller root (the only one that 


Th i this quadratic gives as the 
e solution of this q — 0.652. Eq. (4) similarly gives 


satisfies the original equation before squaring) 9; 
the quadratic equation 
0,2 — 1.39870, + 0.4871 = 0 


the larger root of which is 0.744. In this example the approximate method gives 


almost as good results as the more exact one. 
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5.15 Confidence Limits for the Difference of Probabilities in Two Binomial 
Populations If we are given two fairly large samples which we suspect may 
come from two binomial populations with different parameters 0, and 02, we 
can similarly construct confidence limits for the difference of these parameters. 
If the confidence interval includes the value zero, the inference will be that the 
parameters are not significantly different (at the level of significance determined 
by the confidence coefficient). 


Let us suppose that d is the difference of the two sample proportions of 
successes: 


(5.15.1) dep 2 

Then, by Theorem 1.16 and Bienaymé's Thearen (§ 2.14), we have 
(5.15.2) E(d) = E(p,) — E(p2) = 0, — 0, 
(5.15.3) Иа) = V(pi) + V(p2) 


—N,UO,0—0, + №7101 — 62) 


If the samples are large enough that we may use the normal approximation, d 
will also be approximately normal, and 


4 - (0, — 0) К 
Раза [710,01 = 01) + №7 !0«1 -8)]/2 ~? 


For the 95% limits we may put 2 = +1.96, and solve for 0, — 0. Since 


we do not know 0, and 0, separately, we must replace them in the denominator of 


Eq. (4) by their estimators, p, and p. The 95 % confidence limits are then given 
approximately by 


(5.15.5) 0, —0, =а+ 1.96 [ №, "ра (1 = р) + № ! py(1 = p3]^ 

EXAMPLE 4 А company selling “XX” 
owners in each of two districts, 4 and B. 
planned to purchase tires shortly and 300 
brand. In district B, 600 persons planned to purchase tires and 210 intended to 
get XX tires. Does there appear to be a significant difference between districts 
A and B with regard to the proportions of prospective XX purchasers? 

Here d = 0.40 — 0.35 = 0.05. The approximate standard error of d is 


(0.40)(0.60) (0.35)(0.65)] 1/2 
| 750 ^ 60 | REM 


tires conducted a survey among car 
In district 4, 750 persons said they 
said they intended to buy the XX 


so that 


0, — 0, — 0.05 + 0.052 
= —0.002 to 0.102 
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Since the interval includes zero (although only just) we can say that at the 5% 
level the observed difference d is not significant. The result is, however, close to 
the borderline of significance. 


* 5.16 Sampling for Proportion of Successes from a Finite Population If the 
sample of size N is drawn “without replacements” from a finite population of 
size M, the distribution of the sample proportion of successes p is not binomial 
but hypergeometric (see $3.5). The expectation and variance of p are given by 


E(p) = 0 


(5.16.1) M-N 
V(p) = ———— 90 — 0) 
(p) NM —1) ( 
For large N (and of course still larger M), the distribution of p is approximately 
normal, and confidence limits for 0 may be determined as in $5.14, with the 


appropriate correction for the variance. 


* 5.17 Use of Binomial Probability Paper А special graph paper, designed 
by Mosteller and Tukey [5], may be used to obtain quick approximate solutions 
of estimation problems involving binomial populations. The scheme is based on 
Fisher's angular transformation (see 83.15), p — sin?A, which has the effect of 
making the variance of А a function of the sample size only (proportional to 
1/N) and also of improving the approximation to normality. A specimen of this 
Braph paper is shown in Figure 32. | | 

The scales of x and y are square-root scales. The horizontal distance of a 


Point marked x from the origin is proportional to x^^, and similarly for y. А 


quarter-circle is drawn through the points marked 100 on each axis, and on this 
Circle x + у = 100. The angles 4, in degrees, are marked on this circle and the 
abscissa of a point A is the corresponding p (multiplied by 100). At a distance of 
N from the origin, in a direction given by A, the variance of A on the circle of 
radius VN is practically constant, independent both of N and of 0. Any straight 
line through the origin passes through points for which у/х is constant, and is 
called a split. A 40-60 split, for example, passes through the point x = 40, 
pies 
ы that in а sample of 10 we find 7 "successes," and therefore 3 
"failures." We say that the paired count for the sample is (7,3) and plot it as a 
Tight-angled triangle with the right angle at (7,3) and the sides each one unit long, 
Parallel to the axes. When one of the coordinates is larger than about 100, 
the one-unit length is scarcely more than the width of a pencil line. | 
In order to test whether the observed value of p (7/10) is significantly different 
from a hypothetical 0 (say 1/2), we measure the perpendicular distance from 
the plotted triangle to the 50-50 split. When the numbers x and y are small, there 
are two distances, called the short and the long distance, measured from the two 


> 
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acute angles of the triangle, and these are interpreted by reference to the scale 
at the top of the paper (marked Full Scale). A distance of one unit on this scale 
corresponds to a standard normal deviate of 1, so that a distance of two units on 
the scale represents very nearly the 57; level of significance (when we are inter- 
ested in the magnitude of the difference between p and 0 rather than in the sign). 
The long and short distances each give a significance level and the observed 
result must be regarded as significant at some level in between. In the illustration 
above, the two distances are 1.6 and 1.0, so that the observed p is not significantly 
different from 1/2 at the 5% level of significance. 
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FIG. 32 BINOMIAL PROBABILITY GRAPH PAPER 


EXAMPLES Inan opinion poll 124 “ 
question out of 200 replies. Find 95 % 
of persons who would answer “yes” 

The paired count is (124,76), 
is practically a point). Two spli 
distances of two scale units fro 
and (69, 31) so that the 95% 


yes” answers were received to a certain 
confidence limits for the true proportion 
in the population sampled. 

and this is plotted as P in Figure 32 (the triangle 
ts are drawn such that they lie at perpendicular 
m P. These splits cut the quarter-circle at (55, 45) 
confidence limits for 0 are 0.55 and 0.69. 
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5.18 Confidence Limits for the Parameter of a Binomial Distribution with 
Small Samples The normal approximation is not really justified for small 
samples, particularly у/һеп ,0 is not close to 0.5. By the use of cumulative 
binomial tables (such as those mentioned in References [3] and [4] of Chapter 3) 
it is possible to determine the parameter 0, of a binomial such that, for example, 
the observed value of X cuts off the upper 24% tail of the distribution. In the 
same way, 0, can be found such that the same X cuts off the lower 21%, X itself 
being included in the tail (see Figure 31). These values 0, and 0, give the 95% 
confidence limits for 0. 

Thus if N = 20 and Х = 6, we find from the tables that B(6, 20, 0.11) 
= 0.01755 and B(6,20,0.12) = 0.02602. By interpolation, the value of 6, 
Corresponding to B(6, 20, 0) = 0.025 is about 0.119. Also B(7, 20, 0.55) 
= 0.97859 and B(7,20,0.54) = 0.97349, giving б„ corresponding to 0.975, 
as 0.543. The 95% confidence limits for 0 are therefore 0.119 and 0.543. 

It may be noted for comparison that the approximate method of Eqs. 
(5.14.5) and (5.14.6) gives 0.099 and 0.511. The method of Eqs. (5.14.3) and 
(5.14.4) gives 0.128 and 0.543, so that even with an N as small as 20 the normal 
approximation, with a continuity correction, is fairly satisfactory. 

Mention may be made of special tables [6] by Mainland and others, pre- 
Pared for the Department of Medical Statistics at New York University College 
of Medicine. These give 95% and 99% confidence limits for 0 for a considerable 
Tange of sample sizes and observed proportions, and include all cases that are 
likely to arise in practical statistics. 


PROBLEMS 


А. ($$ 5.1-5.6) | . А 

1. The variate X is distributed in a population with density f G8) = 200 — х)/0°, 
0 <x — 0, Tt is desired to estimate 9 from a single observation by using the statistic 
T = 2x, Write down the density function for T, integrate to find F(|0), and calculate 
the values of tı and гә from Eq. (5.3.1) when ё = 0.05. Plot the curves C. and С1-ь 
for 0 < 1, If the observed x is 0.02, find 90% confidence limits for 0. Hint: tı and ts 
are given by solutions of quadratic equations for 11/0 and t2/8. In each case only one 
Solution is possible, since t < 26. 


2. If X is uniformly distributed on an in 
an estimator for 0 is the mid-range of a sample, that is, the mean of the smallest and the 


largest observed values. If T is this estimator for a sample of size 4, the density function 
for T is f(t) = 320.5 — |t — 6p». 10 = 1, calculate the values of t: and te corres- 
Ponding to e — 0.025. Sketch the confidence belt for @ and find 95% confidence 
limits corresponding to an observed t= 1.2, Hint: If @=1, 0.5 « < 1.5. Treat 
€ cases у tely. ; , 

3. If хь dee ка with mean р and variance c?, the annee mean 
ДА of a sample of size N is normally distributed with mean p and variance o IN. If X 
I5 used as an estimator of p, find 99% confidence limits for » corresponding to an ob- 
Served value of X. (The variance о? is assumed to be known.) 
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4. Assuming the distribution of the arithmetic mean, as given in Problem 3, show 
that X is a consistent and unbiased estimator of џ. 

5. The distribution of the mid-range of a sample of size N for a uniform distribution 
of X on (0 — 3, 8+ 3) has the density function f(r) = N2*-1(4 — |t — 0[)N-. 
(Compare Problem A-2, for N — 4.) Show that the mid-range is a consistent and un- 
biased estimator of 0. Hint: Prove that E(T — 0) = 0 and E(T — 0)? = [X(N + 1) 
(N + 2)]-1. Treat separately the integrals from 0 — 3 to 0 and from 8 to - 1. 

6. If X has a rectangular (uniform) distribution on the interval (0,0) and if R is the 
range for a sample of size N (the highest value xx minus the lowest value x1), the 
distribution function for R, for a given 6, is F(r|8) = (r/@)*(NO/r — N + 1). Show 
that the statistic T = R/Ü has a distribution independent of 0 and that fiducial limits for 
9 with confidence coefficient 2 are given by xx and R/t;, where æ is the probability that 
T > tq. Hint: о is the probability that 1, < R/Ó < 1, i.e., that 1 < 0/5 < г,-!. Write 
this as a fiducial probability for 0. For given о, t, is the root of an equation of degree N. 
Note that 0 must be at least equal to x y. 

7. For the rectangular distribution f(x) = 0-!, 0 < x < 0, prove that fiducial 
limits for 0 with coefficient «, based on a sample of size two with values x1 and xe, are 
Xe and (хә — x1)/(1 — «!/2). Hint: Use Problem 6 with N = 2. 

8. For the same distribution as in Problem 7, an estimator of 0, based on a sample 
of two, is xı + xs. The density function is f(r) = 1/62, for t < band f(t) = (20 — 0/0? 
for г > 0. Show that confidence limits for 8, with confidence coefficient о, are given by 
Ga + x2)/[2 — (1 — ©) м] and (x1 + x2)/(1 — о)!/°, except that when the lower limit 
is below x2 it must be replaced by x». ' 

Work out numerical values if x1 = 3, хә = 5, and « = 0.9. Compare these limits 
with those given by Problem 7 for the same data. 

9. (a) A sample of N objects is taken from a large binomial population in which a 
proportion @ of the objects possess a certain attribute A. If p is the proportion of 
objects possessing this attribute in the sample, show that pg/(N — 1) is an unbiased 
estimator of (1 — 6)/N, where q = 1 — р. 

(b) Suppose that the sample is selected, one item at a time, until m of the selected 
items аге A’s. Calculate the probability that the size of the sample is N, and show that 
(т — ПКМ — 1) is an unbiased estimator of б. Hint: Find the probability that in the 


first № — 1 items there are m — 1 475 and that the № item is an A. The distribution 
of N — m is negative binomial. 


B. (88 5.7—5.11) 


1. Write out the proof of the statement in Eq. (5.7.8), that E(< ра >) = pena 

2. In the following table, X represents the number of defective items produced 
by à machine in one day's operation, and fis the frequency of occurrence of X over a 
period of 200 days. Compute the first four k-statistics for this empirical distribution, 
which is roughly Poisson. (Note that X is discrete. There is no occasion to use either 
an auxiliary variable or Sheppard's Corrections.) 


x f 
0 102 
1 59 
2 31 
3 8 
4 or more 0 
200 


3. Find the standard errors of kı and k. 


2 and the estimated covariance of kı and ka 
for the data of Problem 2. 
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4. Find the standard errors of kı and kz as calculated for the и variable, for the 
data on weights used in 88 5.9 and 5.10. (Use the corrected values of Кә and Аз.) 


C. (88 5.12-5.16) > 

. 1. А normal population has mean 20 and standard deviation 2. A sample of six 
items from the population has a mean 18.2. Can the sample be reasonably regarded 
as а random one, using the 5% level of significance? Hint: Calculate the probability 
that а random sample would have a mean differing from 20 by as much as 1.8. 
Alternatively, find 95% confidence limits Гог ш and see whether these include the 
true value 20. 

2. A normal population of times has a stan 

Sample of 12 items from the population has a mean 12.33 sec. Calculate 90% 
confidence limits for the population mean. What is the smallest sample size we should 
use if we want to be 95% sure that the sample mean will not differ from the (unknown) 
Population mean by more than 0.05 sec.? : 
. З. A group of 120 freshmen in arts at a large university take an achievement test 
in mathematics and obtain a mean score 70 with a standard deviation 14. Another 
group of 80 in engincering take the same test and obtain a mean score 75 with a standard 
deviation 12. Is the difference in the means significant at the 1% level? Hint: The 
samples are large enough for the populations to be regarded as normal. The variance 
of the difference of the means is the sum of the variances of the two means separately. 
(Compare $ 5.15.) As an estimate of the population variance take the weighted mean 
of the sample variances, weighted according to the sample size less one. Assume both 
Populations of freshmen are large compared with the sample sizes. 

4. If 400 eggs are selected at random from a large consignment and 50 are found 
to be bad, what are the approximate 99% confidence limits for the proportion of bad 
eggs in the whole consignment? Calculate also the more exact confidence limits for 


comparison. 

5. A physician treats 20 patients sufferi 
The mortality rate for this disease, based on i 
Sample significantly different from the population, 


the probabili suming normality. . 

6. In eed ped Leda haa of 4 drug said to prevent sea-sickness, 25 men who 
had always developed symptoms of sickness when subjected to the motion of a rocking 
Machine were given the drug. On a further trial with the machine, 15 of these men were 
found to be immune to the motion. Find 9575 confidence limits for the proportion 


dard deviation 0.104 sec. A random 


ng from a certain disease and 11 of them die. 
thousands of cases, is 42 %. Is the physician's 
at the 55; level? Hint: Calculate 


Ssume approximate normality.) | А 
te I omen the question was asked, “Do you approve of 

а Боева PET and 89 of the men and 116 of the women 
mate 95% confidence limits for the difference 
the male population sampled and that in 


With th | 
8. Randa Kis of 50 students each were taken from (а) а freshman class in 
arts and science numbering 248 and (b) а freshman class in engineering numbering 187. 
‘oth sample groups were given а mathematical aptitude test, and the numbers reaching 
а pass standard were 35 and 41 respectively. Test the hypothesis that the proportion 
Of passes would be the same in both classes if all members were tested. Hint: The two 
Populations are finite. Use Eq. (5.16.1). Calculate the probability of a difference as 
Breat numerically as that observed if the stated hypothesis were ГЫ 
. A research worker wishes to estimate the mean of a population using a random 


Sample so large that the probability will be at least 0.95 that the sample mean will not 
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differ from the population mean by more than 25% of the population standard devia- 
tion. How large should the sample be? 

10. Obtain some binomial probability paper and solve Problems C-5 and C-6 
graphically. : 
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Chap ter 6 


ESTIMATION, TESTING AND DECISION 
MAKING 


6.1 Maximum Likelihood Point Estimation А very useful method of esti- 
mation, which has been vigorously promoted by Fisher, is the method of maxi- 
mum likelihood. The general idea is to choose as estimator of a parameter 0 
that function of the sample observations which will, when substituted for 0, 
make the probability of the sample a maximum. In other words, for this value 


of 0 the observed sample is also the most likely sample. 
Consider, for instance, a binomial population with parameter 0. The 
variate observed is the number of successes X in N trials, and the probability 


that Y — x is 
fep = (2) ra - o" 


As a function of б, this is a maximum when 4/00 = 0 and 0^f/00* < 0. 
Since 
quede (х) [х0*- (1 — 07*— (N — ха — =] 
х 


= Л [х0- * – (N – 90 — 07] 


the critical value 0 of 0 is given by 


xó-1-(N 3-0)! =0 
or 
^ 


= 


It is easy to verify that this value does indeed correspond toa maximum for f 
and not a minimum. The maximum likelihood estimator is therefore identical 


With the unbiased estimator used for 0 in Chapter 5. | | 
If the continuous variate X has a probability density f (x|8, 6,) which 


depends on a parameter Ө and possibly on other parameters represented jointly 
by 6,, the likelihood of а set of sample values х1, Хз... Xy 18 defined by 
(6.1.1) L = 010,0) ` Јо, 0.) - - „Јол, 6.) 


The likelihood is therefore a joint probability density for the whole sample, but 
123 
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not a probability. When X is discrete, L is just the joint probability for the 
observed sample (all the items being supposed independently selected from the 
same population). The principle of maximum likelihood states that we should 
choose as an estimator of the parameter 0 that statistic T (if it exists) which 
maximizes L for variations in 0, whatever the values of the other parameters 0, 
may be. 

In practice, the logarithm of L is usually more convenient than L itself. 
Since log L is a monotone-increasing function of L, a value of 0 which maximizes 


L also maximizes log L. The maximum likelihood estimator is therefore given 
by solving for 0 the equations 


д д? 
(6.1.2) 290198 Е) =%0, agi loe L)«0 


EXAMPLE | 


For a normal population, with parameters р, в, the density 
function is 


asa 
Ло о) Qno?) из exp| - oe | 


Therefore, for a sample of size N with values Xu XS 


i 20? 


L = (210?) "? ev|-x mI 
and 


| N cans 
(6.1.3) log L = -> log(2n) — N logo — У i-r 


r 7° 


Differentiating with respect to и, we find 


alog 1) хуи 


ди P qe 
and 
(log L) _-N 
д2 а? 
The maximum likelihood estimator for И is therefore f, given by 
E- д =0 
ог 
а-у 
=? х= т 


thesample mean. This result is independent ofthe value of the other parameter с 
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If, however, we try to find an estimator for c in the same way, we get 


GlogL) М (x; — д)? 
óc | c +È o? 


and, on putting this equal to zero, the value of ё is not independent of и. The 
extraneous parameter in such a case is often called a nuisance parameter. There 
is no maximum likelihood estimator for c by itself, but it is possible to get joint 
maximum likelihood estimators for и апа c by solving together the two equations, 


д д 
c - d (1081) =0 
au EP 0 an z 198 ) 


These give д = т, 6? = x У (x — m, so that the joint maximum likelihood 


estimators for д and c? are the sample mean (m) and the sample second moment 
(m,). It may be noted that, although the former is unbiased, the latter is not, 


since, as we have already found, р 
a2 
Е(т,) =(N — Dx 

and not c? itself. 


6.2 Sufficient Estimators Some characteristics of estimators were men- 
tioned in $ 5.6, but there are others which are also important. A statistic Т is 
Said to be a sufficient estimator of 0, or, in Fisher's terminology, exhaustive, if 
it uses all the relevant information in the sample. If the likelihood function is 


expressed in the form 
(6.2.1) L = g(tlO)h(xy, x2 - - - xy|t, 0) 


or 
log L —log g + log h 


nction for T and A is the conditional density function for 


Where g is the density fu 
— t, then it may happen that А does not depend on 0. 


Ху... xy, given that T 


If so, T is a sufficient estimator of 0. | | 
For suppose U is another statistic obtainable from the observations. The 


distribution of И for a given ТУШ depend upon h, but since / does not involve 
, the statistic U can provide no information about 0 which is not already given 
by T. 

It is desirable to have a sufficient estimator where possible, since then we 
know that we are utilizing all the information about 0 that we can get from the 
Sample, but sufficiency alone does not define a statistic very precisely. If T is 
Sufficient, so is a function of T. 

Sufficient statistics exist in only a relatively few special cases. It is one of the 
merits of the maximum likelihood method of estimation that if a sufficient 
Statistic does exist for a parameter the maximum likelihood estimator is sufficient. 


126 INTRODUCTION TO STATISTICAL INFERENCE 6.3 


In Example 1 above, the sum У, (x; — и)? occurs in the expression for log L 
and this can be split up into a part depending on m and и and a part independent 
of д. Thus 


LG - д? = У (тт и)? 
= m + N(m — p)? + 20m — i) бы — m) 


=Nm, + N(m — p}? 
since 5; (x, — т) = 0 from the definition of m. Equation (6.1.3) can therefore 
be written 


N Nm, Мт- и)? 
log = — log(2n) — N ово – Sz — МЮ, 


20? 
50 that, apart from constants, 
N Nm; 
logg — E (m — p)?, log h = — 552 


It is clear that ^ is a function of the sample values not depending on и, while g 


is a function of the estimator m and of и. Therefore, т is a sufficient estimator 
for и. 


6.3 Properties of Maximum Likelihood Estimators The following five 
properties are the main reasons for recommending the use of maximum like- 
lihood (m.1.) estimators: 

(a) The m.l. estimator is consistent. If f (x|0) is continuous in x, and also 
continuous and monotonic іп 0 over an interval including the true value б, and 
if T is the m.l. estimator of 6, then T converges in probability to 0, as the sample 
size increases. The proof holds also for a discrete variate if we replace each 
value by an interval over which we suppose the frequency distributed uniformly. 
Details may be found in [1]. 

(b) The m.l. estimator tends to normality as N increases. The conditions in 
(a) are supposed to hold, together with some further conditions on the continuity 
of 0f/00. 

(c) The m.l. estimator is most-efficient (see 8 5.6). The variance of the m.l. 
estimator Т is given by 

2 


(6.3.1) [Vc] -D (ы) fdx 
-o 0=0 


1], 


where, on the right-hand side, 0 is to be put equal to б. If the domain of f does 
not depend on 6, Eq. (1) is equivalent to 


(6.3.2) [ray = -мк(® log f ) 
0=00 


00? 
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Since log L = J; log f(x;), and since the x; all have the same distribution, 
Eq. (2) may be written 


д? log L 
(6.3.3) in = -Ef a ) 


This is a convenient way of finding the variance of a m.l. estimator. 
It can be shown that if U is any estimator for 6, which in large samples is 
normally distributed about 9) with variance V(U), and if the domain of f does 


not depend on 0, then 
д1 2 
(6.3.4) [И = nel (227) | 
бо 


It follows from Eq. (1) and property (b) above that the m.l. estimator is most 


efficient in the sense described in $ 5.6. . | | 
(d) If a sufficient estimator exists for 0, the m.l. estimator is sufficient. 


For, by 8 6.2, if T is sufficient and g(t|0) is its density function, 


(6.3.5) L = g(t|0)-h(xy, x2 - - + Ххх) 

where h is a function of the sample values which does not depend on Ө. Therefore, 
д 10g 
— -—— -—y(0,t 

(6.3.6) 200° L) 726 y, 1) 


The m.l. estimator is given by putting (Ө, t) = 0 and solving for 0. The 
result is obviously a function of t, say (t). The estimator is therefore Ф(Т), and 
Since T is sufficient, so is Ф(Т). А | { 

(е) The m.l. estimator is invariant under functional transformations. This 
means that if T is the m.l. estimator of 0, and if u(@) is a function of Ө, then (Т) 
15 the m.l. estimator for u(0). Р | 

If, for example, we are dealing with a normal population (for which ш 
= 304) and we know that the m.l. estimator for g? is the sample second moment 
т» (that is, N^! Y (x; — my), we conclude that the m.l. estimator for из is 
3m;? and not ту. Of course, m, might be used as an estimator, but it would not 
have as small a sampling variance as 372”. | | 

This property of invariance is not true of all estimators. If Tis an unbiased 
estimator of 0, for example, it does not follow that 7? is an unbiased estimator 
of 02, j 
s normal, and if с is supposed known, the 


ЕхА ulation i: 
toe eee RE Den as shown in Example 1. 


m.l. estimator of д is the sample mean т, 
Х 1 2 
Since log = —4 logo?) — 59 (х — и)’, we Бауе 


9107 (х и) 


ди с 
@Plogf 1 
=a ne 
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д? log f e d N 
м др? j-" „027% 


and 


The variance of m is therefore c?/N, in agreement with the result found in 
Chapter 5. 


* 6.4 The Cramér-Rao Inequality If T is an estimator of 0, and if its bias is b, 
as defined by the relation 


(6.4.1) E(T) 204 b 
where b may depend upon 0, then 
db\? 
(+2) 
(6.4.2) ЕСТ – 0)?] > ao O 
«| (EP f(x|0) ах 


19010) is the frequency function for T, the equality sign in relation (2) holds 
if, and only if, 
(a) T is sufficient 


(6.4.3) 
(b) ô log g 


20 = К(Т — 0) 


where & is independent of T but may depend on 0. 


If Tis an unbiased estimator, so that b = 0, E[(T — 0)?] becomes the variance 
of T, and Eq. (2) is 


(6.4.4) V(T) > [~ F балы] 


-® 


which is formally the same аз (6.3.4), although the conditions imposed on the 
estimator are different. 


The relation (4) was proved independently by Cramér and Rao, although it 


was found earlier by Fisher for the special case of a normal population. For a 
proof see [2]. 


EXAMPLE3 For the two-parameter gamma distribution of Eq. (4.4.6), 
f(x) = e^ Px ваг (оу 1, О<х<о 


Suppose that о is known and we wish to find an estimator for fj. We have 


logf ^ + (а — 1) log x — a log В — log ГО) 
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so that 
log L= -9 7 — Na log f + terms independent of р. 
Therefore, 
allog L) p _ Na 0 
op В В 

giving as the m.l. estimator 

" x; m 

УТС 


where т is the sample mean. | А 
Since, for this distribution E(x) = ар, it follows that E(f) — f, and the 
estimator f) is therefore unbiased. The variance is given by 


* Q*(log f) 
=N o 
|, of* 


f dx 


I 


Ги" 


| 
| 
2, 
8 
— 
l: 
b to 
p 
+ 
Ble 
——” 
с, 
a 
я 


The variance of f is therefore f?/N«. The likelihood function may be written 
log L = -Y B- Nalog b + (a= 1) У log x, — N log r(«) 


= log g + log h 
where | 
log g = — (Na B)/B — Na log В + terms independent of fj 


and log h is independent of В. This shows that B is sufficient. Also, 


allog g) _ №8 _ Na 
ap в В 


№ ,5 
=— — В) 
[2 (B— В 


Which is of the form of Eq. (6.4.3), condition (b). The two conditions for the 
Sign of equality in (6.4.4) are therefore satisfied. 


of a Maximum Likelihood Estimator It 


* n 
6.5 Approximate Calculation | Li | 
hi i thod of maximum likelihood leads to equations 
Ppens sometimes that the me ee о. 


which are very troublesome to solve. In suc 
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simpler, but less efficient, estimator and apply a correction to bring it nearer 
to the desired form. 

If T is the m.l. estimator and U is an estimator which is not quite as efficient 
as T, we know that (ô log 1/00); = 0, and E(0? log L/00?). = —[V(T)] !, at 
least for large N. 

By Taylor's theorem, 


ô log E) ( log 5j (® log E) 
U-T 
( 00 fu 00 Jr ui ) 00? ], 
+ terms of higher order. 


Since we suppose that the quantity U — T is small and since we can approxi- 


mately replace (22 log L/00?), by its expected value, we have, on neglecting the 
higher terms, 


(6.5.1) TxU+V(T) Е og =) 


The last term on the right-hand side is a correction to be applied to U to bring 
it nearer to T. The value of V(T) is obtained from Eq. (6.3.2). 


EXAMPLE4 For the Cauchy distribution— 


1 1 


ев’ 


=0 <% «00 


—the sample mean is not a good estimator of 9, since it is no better than a single 


observation. The sample median may be used and in large samples has a variance 
п?/4М. The m.l. estimator is given by 


log L = — У log[1 + (x, — 0?] — N log x. 


O(log L) any (x; — 0) 


20 ^0 


7 1 +(x; — 0) 


As an equation in 0, this gives a polynomial of degree 2N — 1, which even for 
fairly small N is difficult to solve. From Eq. (6.3.2), 


© 2 
-EVT -v{"( 252) лах 
2N[*  (x-0y—1 
EE" -a [1 ++ (x — 0? ? 
AN (9 и?—1 


dx 


so that V(T) — 2/N. 
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The efficiency of the median is therefore (2/N) + (x?/4N) = 8/n?, or about 0.8. 
The improved estimator will be the median U plus the correcting term 
. 4 x;— U 
V(T)(@ log L/00),, і.е., г. Sim 

6.6 Tests of Hypotheses There is a large class of statistical problems con- 
cerned with testing whether or not some hypothesis is true. For example, a 
machine is turning out thread, and we would like to be reasonably sure that the 
breaking strength is at least 100 Ib weight; if not, the machine may have to be 
re-adjusted. We can take samples of the thread at intervals and test them, but 
because of the variability of the product (inherent in the process of manufacture) 
the samples will vary among themselves. We can, however, use them to test the 
hypothesis (Но) that the mean breaking strength (и) of the thread produced 
is at least 100 Ib wt against the alternative hypothesis (H,) that и is less than 
100 Ib wt. 

The hypothesis which we set up and proceed to test by experiment is called a 
null hypothesis. On the basis of the sample we can take various possible actions. 
We can (1) reject the null hypothesis, which in this example may mean dis- 
mantling the machine, (2) accept the null hypothesis, which means that we 
happily accept the product of the machine as up to standard, (3) declare that 
further experimentation is necessary before we can make a decision. If the size 
of the sample is fixed beforehand, this third procedure is not open to us, but 
in Sequential sampling, as we shall see later, tests are continued until we feel 
Justified in taking either action (1) or action (2). À | 

In taking such an action on the basis of a sample we run a risk of doing the 
Wrong thing. Obviously we may commit either of two kinds of error: we may 
Teject a hypothesis which is really true (this will be called a rejection error, or 
an error of the first kind), or we may accept a hypothesis which is really false 
(this will be called an acceptance error, ог an error of the second kind). : 

Tests are usually made by computing some statistic (e.g., an arithmetic mean 
9T a variance) from the observations and noting whether or not this computed 
value lies in some particular interval, or set of intervals, previously chosen on the 
axis of real numbers. The part of the real axis so chosen is called the region of 
rejection, and the hypothesis Но is rejected if the computed value lies in this 
Tegion, Thus, if the population is known to be normally distributed about a 
Value д with unit variance, and if Но is the hypothesis that р is zero, we shall be 
Inclined to reject this hypothesis if a sample of N observations gives a mean too 
far from zero. If Н, were true, the sample mean would depart from zero by as 
Much as 1.96 М-!/2 in only 5% of random samples ofsize N. By taking as our 
region of rejection that part of the real axis outside the bounds t 1.96N !/2, we 
Tun a risk of wrongly rejecting Ho, but the chance of doing so is only 0.05. ‘By 
Suitably choosing the region of rejection we can make this chance what we like, 
depending on the circumstances of the problem and the consequences of making 


а wrong decision. 
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In some types of problem the test statistic may be a pair of numbers, and 
then the region of rejection may be represented by a part of the x-y plane. The 
concept can obviously be extended to three or more dimensions. However, in 
most of the day-to-day problems of practical statistics the statistic used is one- 
dimensional and the region of rejection is an interval or pair of intervals on 
the real axis. 


6.7 Simple and Composite Hypotheses An hypothesis which is equivalent 
to a complete specification of the distribution is said to be simple. Otherwise, it 
is composite. Thus, if a population is known to be normal and to have variance 
a”, the hypothesis that the mean is Ho is a simple one, since the mean and 
variance together specify a normal distribution completely. The alternative 
hypothesis could be simple also—if it were known, for instance, that the popu- 
lation mean must be either до or и, and could not have any other value. More 
usually, the alternative would be composite. It could be a two-sided alternative 
(namely, that и is either less than or greater than до), or it could be a one-sided 
alternative (that и > по, for instance, supposing that we have good reason to 
believe that it cannot possibly be less). We may, for example, want to know 
whether a new kind of fertilizer, applied in a particular way, will increase the 
yield of a crop, but feel quite certain that it will not actually diminish the yield. 
It would be reasonable in this case to use a one-sided alternative. 


— t 


Рю. 33 ERRORS OF THE FIRST AND SECOND KIND 


6.8 The Size and Power of a Test Suppose we want to test the simple null 
hypothesis (Но), that 0 = 0, against the simple alternative hypothesis (H,), that 
0 = 0,, by means of a test statistic T. In order to fix our region of rejection we 
shall need to know the density function of T when 0 — 6,, say 90400). In 
general, g will depend on the sample size N, and the region of rejection (R) will 
be an interval on the г-ахіѕ (e.g., the interval t > t, in Figure 33). 

If the probability that T falls in К, when Но is true, is о, then о is the proba- 
bility of committing a rejection error (that is, of rejecting Hy when it is true). 
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his Probability is often called the size of the test. It is given by 


6. 
-— zi g(t|8o) dt 
(R) 


and j 
Шу cane by the heavily-shaded area in Figure 33. In practice, R is 
can be a бо that а is 0.05 or 0.01, although of course any convenient size 
I eti Se, igi 
T" А statistic T is discrete, the integral will be replaced by a sum, and it will 
The e be possible to make the size exactly 0.05 or other preassigned value. 
gion of rejection will be a set of values of T, the probabilities of which 


add ; 
up to something near the required о. 


P (0) 


б, —-0 
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- x the alternative hypothesis Н is true, there will be a different distribution 
(that With density g(t|0,). The probability of committing an acceptance error 
is, of accepting Ho if Н, is true) is given by 


(6.8. 
2) В -| g(t|0:) dt 
(A) 


1 possible values of t 
shaded area in Figu 


outside of R). This 


Where Ai 
re 33. The power of 


Probab the region of acceptance (al 


e КОШУ is represented by the lightly- 
St is defined by 


(6.8. 
3) — -| ө) dt 
(R) 


It is а : 
Ob the Probability of rejecting Ho if Hı is true (that is, if Ho should be rejected). 


Vlously, we would like a test to be as powerful as possible for the same size. 
If 0 is any value of 0,, P(0) is called the 


€ power depends, of course, on 01: } 
function of the ieu for бу against 0. If 0 is near to 00, the power will 
Y be small, and if 0 = Où it becomes equal toa. For 0 far removed from б 
Power wil] usually be near to 1, since any reasonable test should be able to 


decig A 
© betw i The ideal power function would be 
$ een theses. 

Ome; very different УР puie 34, in which а = 0 and P(0) = 1 for 


Power 
Usual] 


thi ` 
hing like the one sketched in 
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all 0 not equal to б. Both kinds of error would then be zero, but such ideal 
tests are not available in practice. 

An actual power function will be more like that depicted in Fig. 35. A 
low power сап be tolerated for 0 near бе, since no great harm is done if we do 
make the mistake of accepting 0 instead of 6). Where 0 differs considerably from 
0o, so that it might be a serious matter to mistake one for the other, the power 
is near 1 and the acceptance error р is small. 

The function f(0) = 1 — P(0) is often used instead of the power function, 
particularly in industrial practice, and is called the operating characteristic 
(O.C.) of the test. The graph of the O.C. is like that of the power function 
turned upside-down, with 0 and 1 interchanged. 


8 A 


—9 


Fic. 35 POWER FUNCTIONS OF ALTERNATIVE TESTS 


The two types of error that we have defined depend on conditional probabili- 
ties, the probability that T falls in R when Hy is true (denoted by Te R\H) 
or the probability that T falls in A when Н, is true (denoted by Te А|Н,). Шуе 
assert, on the basis of our observations, that Ho is not true, the chance that we 
are wrong depends not only on these conditional probabilities but also on the 
prior probability (previous to our observations) that Но actually is true. И 
Ро is the prior probability of Но, it follows from the rules for probability calcu- 
lations in Chapter 1 that the chance of being wrong in rejecting Но is given by 


P(Te R|Ho): po = “Ро 


and the chance of being wrong in accepting H, is 
P(T € A|H,)-(1 — po) = f(1 — ро) 


where А is the region of acceptance (the whole domain of T outside of К). If 
the null hypothesis Ho that we choose to test is one that has a small prior 
probability of being true, the chance of being wrong in rejecting it may be much 
less than the size of the test о. 

It is often possible to choose the region of rejection R in different ways, even 
though the size о remains constant. Each choice of R will give rise to a different 
power function. Figure 35 illustrates the possibility that the test using a region 
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R' may be more powerful than the test using К for all values of 0, > 4, but may 
be less powerful when 0, < 00. 

If for every value of Ө, except 65, the power curve of R’ is below that of К, the 
test using А is said to be uniformly more powerful than that using А’. If this 
holds for all possible choices of R’ then the test using R is a uniformly most 
powerful test. In a few cases such tests have been found. 


6.9 The Neyman-Pearson Theorem Let X bea random variable with density 
function f(x). (If X is discrete, the necessary modifications in the proof can 
easily be made). We suppose that f(x) depends on a parameter 0 for which we 
would like to test the simple hypothesis Ho (that 0 = б) against the simple 
hypothesis H, (that 0 — 0,). The test consists of rejecting Но if the observed 
value of X lies in a region R and accepting Но otherwise. The size of the test is 


(6.9.1) а= | f(x|09 dx 
(R) 
and the power is 


(6.9.2) P=| f(x|0,) ах 
(R) 


Suppose now that А’ is any other region of the domain of X for which 


(6.9.3) /(х|®) ах < « 
R) 


If for every such R' it is true that 


(6.9 4) f(x|01) ах < | S(x|81) ах 

R’) (R) 
then R is a most-powerful test, of size not greater than g, for testing 0, against 
00. Neyman and Pearson [3] proved that if a region R exists satisfying (1) and 
Such that x belongs to R whenever 


во) PT 
Уе) 


Where с is some constant, and does not belong to К whenever this ratio >с, 
then R is а most-powerful test of size not greater thang. Theratio in (5) is called 
the likelihood ratio, and will be denoted by L(x). нр 
As well as merely distinguishing between two fixed values 00 and 0,, the likeli- 
hood ratio test applies more generally. Thus, suppose the possible values of 
Ө form a set © (which may, for example, be the interval 0 to 1, or the interval 
— оо to co). The null hypothesis Но may specify that 0 belongs to some subset 
w of Q (for instance, the single value 0.5, or the interval from 0.4 to 0.6) and H, 
is then the hypothesis that 0 belong to О — о. The likelihood ratio is defined as 


(6.9.5) 
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the ratio of the maximum likelihood under H to the maximum likelihood under 
Ay: 
max f(x|8) . 
(6.9.6) L(x) = -——— — 
Ben- 


If L(x) is small, the observed x will be more likely under H, than under 
Ho, so that it would be unreasonable to maintain Но. The test consists in 
rejecting Но when L(x) < c, c being such that 
(6.9.7) P(L(x) < c|Ho) =a 

Many useful tests in statistics are likelihood ratio tests. The statistic Х 
may consist of a set of № independent observations, forming a random sample, 
and the likelihood will then be a joint probability density for the N observations. 
When М is large, the distribution of —2 log L(Y) under Но is approximately a 
chi-square distribution with degrees of freedom depending on the number of 
parameters concerned (one, in the case discussed above). This was shown by 
Wilks [4]. If H, is true, the distribution of —2 log L(X) is approximately non- 
central chi-square (see Appendix A.13). Tables of the non-central chi-square 
distribution may be found in [5]. 

When the parent population is normal, as we shall see in the next section, the 
chi-square distribution of —2 log L(X) holds exactly, even for М = 1. 


6.10 The One-Sided Normal Test Suppose the population is normal, with 
known variance c?, and suppose we wish to distinguish between two possible 
values of the mean, до and 4, where ш > до. 

The test statistic is the sample mean m computed from N observations. Since 
m is normally distributed with mean д (и is either до or дү) and variance o?/N, 
6.10.1 = (=) 112 Nem угез 
(6.10.1) fmi) = (25) е 
The likelihood ratio is 


feni) _ 


N - 2. = 2 
ш) |5, [(т — шщ)? — (т = ро) i} 


(6.10.2) L(m) = 


N 
= e| (и, — HoH + Ho — 2т) 


This will be less than some positive constant с Ши; + ро — 2m < c,, where 
c, is another constant depending on c and on the known quantities ди, до, с? 


N 
and N. Actually c, — log 7 [ (ш — ш]. The relation и, + ug — 2m < с, 
implies 
= Hach da Gi, 


т> C2, с, 
2 
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so that the test reduces to rejecting Но (that и = Ho) if the sample mean is 
greater than a certain value, c;. This value can be determined by deciding on the 
size of the test. » 

Ка is the probability that m > cz, given that д = до, 


(6.10.3) g= КГ ат 
с? 
By the transformation (m — ug) N! [s = v, this becomes 
(6.10.4) а = (Qu? |t dv 
=1- Doo) | 


where vo = (с: — uo)N' 2/¢ and ®(v) is the cumulative distribution function for 


the normal law. 
If we choose « — 0.05, we find from the tables of the normal law that 


00 = 1.645, so that 
(6.10.5) c2 = Ho + 1.6450N ^? 


The test therefore consists in rejecting Но in favor of H, ifm > ug + 1.6459N 717? 
The power of this test is given by 


(6.10.6) P -| /(т|ц) dm 


с. 
= 1 – ®(v,) 

where v, = (c; — и) М" = 1.645 — (и, = uo)N 216. Thus, if py — до 
= 0.30 and N = 9, the power is | — (0.745) = 0.228. With a sample of size 9 
there is therefore a probability equal to 0.228 of detecting a difference и, — Ho 
as great as 0.3 of the standard deviation, if it is known that this difference is 
positive. This is a one-sided test. 

It may be noted that this test can be regarded as а test of the simple hypothesis 
Н, against the composite alternative H, (that и > Ho). The power is then a 
function of и. The set Q of possible values of is the set of real numbers > Ji, 
and the set w consists of the single number до. 

The likelihood ratio is 

/(т|но) 


(6.10.7) Lm) = тах fom) 

и> HO 
where Лети) is given by Eq. (1). This density, Лети), is a maximum for 
variations in и when the exponential factor is equal to 1, that is, when д = m. 
If the sample mean m should happen to be less Шш Hos thio matin would be 
when и is arbitrarily near to ду. Therefore, L(m) = 1 C: gig EE Se Hg 
(6.10.8) L(m) = eg N(m— no 


)2/2а2 
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This is less than c if m — до > сү, and therefore if т > с,. The number c; is 
determined by 


(6.10.9) a = [ruo dm ^ 


= 1— (00) 
with vo = (с — ug)N"?[s. 
The test consists in rejecting Hy if m > cz. If m should be less than Ho, Но 
will naturally be accepted. The power is given by 


(6.10.10) Р(и) = [feo dm 


=1-—Ф(5) 


with v = (c; — ИМ" /о. Since —21ogL(m), from Eq. (8), is equal to 
N(m — uo)^[c?, which on hypothesis Hy is the square of a standard normal variate, 
it follows that in this example —2 log L(m) has a chi-square distribution with 
one degree of freedom, regardless of the value of N. 


6.11 The Two-Sided Normal Test If the null hypothesis Но is that H = Uo 
(for a normal parent population with variance c?), and the two-sided alternative 
H, is that either и > де or и < ро, we shall have, for a test based on the sample 
mean, 


(6.11.1) L(m) = е №" H0)?/20? 


which is less than c if |m — | > cy. 
The region of rejection therefore consists of two parts—from — оо to 
Ho — c, and from ро + c, to œ. For a given а, с, is fixed by the relation 


(6.11.2) MEL dm +f f(m|ug) dm = а 


Hote, 
Because of the symmetry of the normal distribution, both integrals are equal to 
a/2, and as in $ 6.10 we find 
(6.11.3) 0]2 =1 — (v), v=c,N*/o 
For х = 0.05, с, = 1.960N ^!?, The power is given by 


(6.11.4) Р(и) 21 -[ ° "ти dm 
Ho-e1 
—1- 9 (v1) + Ф (vo) 
where 
NU? 


с 


и: = (Ho + c, — и) = 1.96 — М0 - Ки — Ho) 


6.12 ESTIMATION, TESTING AND DECISION MAKING 139 


and 
NU 
с 


vo = 


(цо — € — н) = — 1.96 — №207 Ци — po) 


EXAMPLE 5 What size of sample is necessary in order to detect with proba- 
bility 0.8 a difference between the population mean and the assumed value до 
amounting to as little as 0.20, given that the probability of rejection error (of 
stating that a difference exists when in fact it does not) is 0.05? 

Here c, = 1.960N ^17, 


P(u) = 1 — Ф(01) + Dvo) = 0.8 
with 
v, = 1.96 + 0.2? 


vo = —1.96 + 0.2N"'? 


(The two plus signs, or the two minus signs, go together.) For fairly large N 
(taking the plus signs), d(v,) = 1, (uo) = 0.8, so that vo = 0.8416, giving 
М = 196. With this value, v, = 4.76, and ®(v,) is certainly close enough to 1. 
A sample size of 196 will therefore give the required power. The same result 
follows if we use the minus signs, with (vo) = 0, Ф(ь,) ~ 0.2, and v, = —0.8416. 


* 6.12 The Randomized Neyman-Pearson Theorem It is possible to increase 
the power of a test, in certain circumstances, by allowing a randomized decision. 
The total domain of the statistic X is divided into three parts: К, А, and D. If 
the observed x falls in А, Но is rejected, and if it falls in A, Ho is accepted, but 
there is also a doubtful region D. If x falls in D we toss a coin or draw a card or 
consult a table of random numbers—that is, we employ some randomizing 
procedure which gives us a known probability of rejecting Ho. 
We can define а test function y(x) by letting у(х) = 1 if x К, W(x) = 0 if 
x € A and W(x) = Wo if x € D, w(x) being in all cases the probability of rejection 
of Но and фо being a number between 0 and 1. 
The randomized Neyman-Pearson theorem states that if L(x) is defined as in 
(6.9.5), and if 
W(x) =1 when L(x) < с 
(6.12.1) у(х) =0 when (х) > с 
Wx) = Vo when L(x) = с 


then the test with test-function (x) is most powerful of size æ for testing Ho 
against H,. The value of c is determined by 


(6.12.2) Р(ЦХ) < ‹|Но) < а 
and the value of Џо by 
(6.12.3) P(L(X) < ¢|Ho)  voP(LXO) = сн) =a 
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If the region D includes only a single value of x, Yọ is uniquely determined, but 
in other cases several values of Yọ may be found to satisfy this equation. 


EXAMPLE 6 Suppose we want to test the hypothesis that the proportion of 
defectives in a large lot of manufactured articles is not more than 10%, and we 
decide to do so by taking a sample of four items and noting the number (№) of 
defectives. Clearly X can take only the values 0, 1, 2, 3 or 4, and the larger X is, 
the more readily we shall reject the hypothesis. 

The probability of exactly x defectives is 


(6.12.4) f(x|8) — (ха — 0)*-* 


If 0 = 0.10, this expression, for x = 2, 3, 4, takes values 0.0486, 0.0036 and 
0.0001, respectively. If 0 « 0.10, these values will be still smaller. We might 
therefore take as the region of rejection the set of values x = 3 and x = 4, and 
the rejection error will be 


(6.12.5) У f(x|0) < 0.0037, 4 < 0.10 
(R) 


If, however, we include x = 2 in R, we have 


Y/G|0 < 0.0523, 0 < 0.10 
(R) 


and, at least for some values of б, the size of the test will be greater than 0.05. 
The non-randomized test would therefore tell us to reject Ho if X = 3 or 4, and 
accept Ho if, Y = 0, 1 or 2. The power of this test is 


(6.12.6) P(0) — Y. salo = 0* + 40*(1 — 0) 
x=3 


For 0 = 0.20, this is 0.027. 

Suppose now we use a randomized test, and decide to reject Но with proba- 
bility Yọ when X = 2. The probability of this under Но is 0.0486, so that 
Eq. (3) gives, for « = 0.05, 

(6.12.7) 0.0037 + 0.0486 y = 0.05 
Therefore, Yọ = 0.95. The randomized test is: 


reject Ho if X =3o0r4 
accept Но if X =O or 1 
reject Ну with probability 0.95 if X =2 


А way of rejecting Но with probability 0.95 would be to use a table of random 
two-digit numbers. Before opening the table, decide arbitrarily on a particular 
page, a particular column, and a particular position in the column (say seventh 
from the top). Then look up the number. Ifit lies between 00and 94, inclusively, 
reject Но. 
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The power of this randomized test is 
(6.12.8) P(0) — y f(x|0) + 0.95f(210) 
= ү + 403(1 — 0) + 5.700?(1 — 0)? 


For 0 = 0.20 this is 0.173, greater than for the non-randomized test. 
In the above example we did not need to find the likelihood ratio, but we 


can easily do so. The maximum of Eq. (4) under Ho is i) (0.10)*(0.90)* > if 
| e (4\ [VS х\4-* |. 

x > 0, ог1 if = 0, and the maximum under Hi, is M (3) (1 - z) (given 

by 0 = z) if x > 0, or (0.90)* when x = 0. Therefore L(x) = (10/9)* when 


0.40\*/ 3.60 
x = 0 and L(x) = (=) (= = 


which is the с of Eqs. (2) and (3). The probability that L(X) = c is the same as 
the probability that X — 2. 


4-x 
) when x > 0. For x = 2, this is 0.1296, 


EXAMPLE 7 Suppose the null hypothesis Но is that X is a random variable 
with a rectangular distribution of mean 2 and range 2, and the alternative H, is 
that Y has a rectangular distribution of mean 4 and range 4. It is clear from 
Figure 36 that Н, must be accepted when 1 < x < 2 and rejected when 


10 
V (3) | 


/ 
/ 


0/12 0 123 
—>x —>х 


= 


~ 
> 


hje юе 


Fic. 36 RANDOMIZED TEST 


3 <х<б. The only doubt arises where the two distributions overlap, for 
2 < x < 3 (the region D). Evidently, L(x) = oo, 2, 0 for the regions А, D and 
R. If in the region D the probability of rejection is Wo, we have 


a = P(3 < x < 6|Ho) + WoP(2 «x < 3|Но) 
= 0 + Wo(1/2) 
If = 0.05, уо = 0.10. The power of the test is 
P = Р(3 < x < 6|Н,) + фоР(2 < x < 3|Н,) 


31 
=} + 0.1(4) =10 
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The value of wo is here not unique. If we take yo(x) = (x — 2)/5, 2 < x < 3, this 
will give the same size and power as Yọ = 0.10. The error g is now determined by 


3 c 
a = Р(3 «x < 6|Ho) ‚| Wo(x)P(2 < x < 3|Н,) dx 
E 


3 
NU т 
-| 10 dx = 0.05 


2 


6.13 Statistical Decisions and Risk The practical problem of the statistician 
is usually that of making a decision in the face of uncertainty. The problem may 
be one of deciding on the best value to use for some characteristic of a popu- 
lation, such as the variance, or it may involve deciding between alternative 
hypotheses. The decision may require action, for example, accepting or rejecting 
a lot of manufactured articles after having inspected a random sample, or 
recommending the use of a particular fertilizer for increasing the expected yield 
of a crop, after analysing some experimental results. We should like to have a 
sound guiding principle for use in making such decisions, but we must not expect 
too much from any single principle. The circumstances of a particular problem 
will be all-important. 

There are two general decision principles which have been quite widely 
used, one associated with the names of Bayes and Laplace, the other due to 
Abraham Wald, although these are by no means the only possibilities. The 
Bayes rule is to choose that course of action which has the largest expectation of 
gain (or, which comes to the same thing, the smallest expectation of loss). This 
rule assumes that we know, or can estimate, the prior probabilities of the 
various possible situations with which we may be faced. The Wald, or minimax, 
principle is to choose that action which minimizes the maximum loss that could 
occur in the worst possible case. This is evidently a rather pessimistic attitude, 
but it does minimize the risk of a disastrous loss. 

Both principles require the person making the decision to give numerical 
values to the gains, or losses, which will ensue from the various possible actions. 
Sometimes this is a fairly straightforward matter of cost accounting, and the 
values can be given in dollars and cents. If the problem is concerned with accept- 
ing or rejecting a lot of manufactured articles, on the basis of some sampling 
scheme, it will generally be possible to estimate fairly accurately the costs of 
sampling and inspection of individual items, and also the losses involved in 
acceptinga poor lot or rejectinga good one. If, however, the problem is to decide 
between alternative medical treatments of a disease, the error of saying that a 
proposed new treatment is no better than the old one, when in fact it is better, 
may cost lives which might have been saved had the new treatment been adopted. 
Even when the gain is monetary, it may be argued that its value is different in 
different circumstances. А sum of $50 does not look the same to a millionaire 
and to a hobo. Economists have attempted to make a scale of "utility" to 
measure satisfactions and preferences, and where costs, or losses and gains, are 
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mentioned later in this book the units may, if desired, be understood as units of 
utility. 

The risk of making a wrong decision may often be very greatly reduced by 
taking a large number of observations, but sampling and experimentation cost 
money, or at least time and effort, and this should be reckoned in the total 
accounting. Wald therefore introduced a risk function which depends partly on 
the decision made or the action taken, and partly on the cost of experimenting 
so as to have a basis for decision. 

Suppose we base our decision to take one of k possible actions a,,a5 . .. a, on 
a single sample of N observations of a variate X. Let the cost be cy, a bounded, 
non-negative number depending on N and possibly on the actual set of obser- 
vations. (If all observations cost the same, cy is proportional to N.) The 
probability of action a; will depend on the decision rule d which is used, and 
may be denoted by p(a;|d), and d of course depends on the set of observations 
ху... Xy. There will be a joint likelihood function f(x; .. . xy) for any given 
set of values of Y, and this function will generally depend on one or more 
parameters, the values of which represent the unknown state of Nature. For 
convenience we will suppose that there is only one parameter, 0, which can take 
a set of values symbolised by Q. 

If the statistician takes action a; when 0 is really equal to 0;, we can suppose 
that his loss is L(a, 9). Wald regarded this loss as always non-negative, and 
equal to zero when the best possible decision in the circumstances is made. Any 
other decision involves a positive loss. The expected loss for the given set of 


Observations is : 
(6.13.1) E[L(a;, 05] = » Ца, 05)p(a;|d) 


and the expected loss, whatever the sample observations may turn out to be, is 
k 
(6.13.2) r(85 = У, [i 0;)p(a;|d)f(x) dx 
i=1 


where f(x) dx is written for f(x; - - - xy) dx, ...dxy and the integral is over the 


whole N-dimensional sample space. — | 
The expected cost of the observations will be 


(6.13.3) (0) = [ло ах 
since this cost will not depend оп the subsequent action а;. The risk function is 
the sum of ғ; and rz, namely, 


(6.13.4) (0) = r,(0;) + r2(8;) 


EXAMPLE 8 A zoologist wants to estimate the average number (и) of a 
particular organism per unit volume in the water of a lake. He takes a sample 
of volume v and counts the number of such organisms (X) in this volume. Не 
estimates и by the ratio x/v {= m), where x is the observed value of X. 
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The distribution of X may be assumed to be Poisson, so that 
er 


(6.13.5) P(X = х|р) = (wo) a 


If т is equal to и, the estimate is correct and there is presumably no loss. 
The loss will depend on the size of the error. It can hardly depend on the first 
power of the error, since if it did there would be a negative loss (a gain) when the 
error was in one direction. The simplest thing is to suppose that the loss is 
proportional to (m — и)?. The loss due to estimating и as т is, therefore, 

Ит, и) = k(m — uy? 

The expected loss, whatever the observed x, is 


= х 2 к e" 
PEG -u) - (о) -T 


T 
v 


(6.13.6) riu) 


since this is a discrete distribution and the integral therefore becomes a sum. 
The cost of the sample c may be added to r, to give the risk function. 


6.14 Bayes' Principle This principle аззитез that there is a prior proba- 
bility distribution for the unknown parameter 0 (which we may think of as a 
state of Nature). We wiil denote this probability density by py. One hypothesis 
we might make regarding Nature is that 0 belongs to the set w (a subset of the 
set Q of all possible values of 0). We investigate this hypothesis by taking a set of 
observations, which have values x,, х... xy (collectively denoted by х). The 


probability of this set, given that 0 belongs to о, is |, ‚ Р«|0)ре 40, the in- 
tegration (or sum) being over all values of 0 such that 0 belongs to о. The proba- 
bility of the same set, whatever the value of 0, is [ Pede, d0. Therefore, the 


probability that 0 belongs to w, given the observed set of values x, is 


_ ГР) рь 40 

fi; PG|0)p, 46 
This rule was first clearly stated by Bayes [6], and used by him for reasoning back 
from the observed sample to the population sampled. Bayes recognized, how- 
ever, that the use of this rule of inference depends upon knowing the prior 
probabilities ру, and except in artificial illustrations we seldom know much about 
these quantities. Bayes suggested, although apparently with some misgivings, 
that if we know nothing whatever about ру we should assume as a basis for 
action that all possible values of 0 are equally likely. This suggestion was adopted, 
rather uncritically, by Laplace, but it was so vigorously attacked in recent times 
(mainly by Fisher) that the rule fell into disrepute. It is now beginning to be 
generally recognized that Bayes' approach may be very helpful in certain 
situations. 


(6.14.1) P(0 e |x) 
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If the loss function for action a; and state of Nature 0 is /(а,, 0), and if the 
prior probability density of 0 is ру, the expected loss associated with a, is 


(6.14.2) Ца) = | L(a;, 0)p 40 
(0) 


The Bayes principle is to choose that particular a; for which Z(a;) is a minimum, 
or, which comes to the same thing, that for which the utility is a maximum. If 
some information is available on ps, from other experiments or from intuition, 
this can be used, but if no information is available, p; is to be taken as uniform 


Over all 0. 


EXAMPLE 9 А dealer buys fuses in lots of 10,000 and sells them at 10 cents 
each, with a double-money-back guarantee if they prove defective. To protect 
himself he takes a sample of N for destructive testing, and refuses to buy the lot 
if п or more of the samples are defective. What value should he choose for n? 

The probability of x defectives, if the proportion of defectives in the whole 
lot is 0, is 


b(x, №, 0) = (jou — 6) 


The probability of accepting a lot with proportion 0 is $324 d(x, N, 0) 
= | — B(n, М, 0) and the expected net income in dollars received from such a 


lot is 
N 
un, 0) = (1 — 20 (1000 — 15} — Bt N, 8) 


Since N of the 10,000 have been destroyed in sampling, and the dealer has to 
Pay out 20 cents for each defective one he sells. Suppose he estimates the prior 
Probability of 0 as ро. His expected income from the decision rule he has 


adopted is 
1 
u(n) -Í u(n, 0): py 40 
о 


and he should choose и so аз to make this as great as possible. If he feels that 
any value of 0 is as likely as any other, he will put pọ = 1, and maximize the 


quantity 
1n-1 N oe 
(6.14.3 a | ( jc — 280*(1 — 6)*-* 40 
) 1000 — N/10 Jo p x 


The integral can be evaluated in terms of beta functions and reduces to 
nN +1 — n) 


о 
(М + 00У + 2) 
rejected if there аге k or more де 


which is a maximum when n = (N + 1)/2. The lot should be 


fectives in a sample of size 2k — 1. 
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It may be considered unrealistic to suppose that р, is constant, and the dealer 
could, for example, suppose that @ is equally likely to be anywhere between 
0.01 and 0.05, but is quite unlikely to be outside these bounds. That is, he could 
assume a rectangular distribution for 0. The integral in Eq. (3) will then be 
expressible in incomplete beta functions, and by the use of tables the maximizing 
value of л can be found. 


6.15 Wald's Principle (Minimax Principle) To avoid having to estimate the 
prior probabilities, Wald suggested the principle of choosing the action which 
would minimize the maximum risk that could be feared whatever the state of 
Nature might be. In the example above, if 0 could be as high as 1, or even a 
little more than 0.5, this principle would tell the dealer to refuse to accept the lot 
without troubling to sample it at all. If, however, he feels that the worst possible 
lot would have a 0 equal to 0.05, say, he will choose л so as to maximize u(n, 0.05). 
This will mean accepting the lot without sampling, since then he gets as much 
income as possible and avoids the loss of the fuses destroyed in testing. 


ExAMPLE 10 This is a simplified betting problem, suggested by Sprowls [7]. 
The bettor has two possible actions in each case, to bet or not to bet. A bet is 
always to win and always at the same odds; if he wins he gains o, if he does not 
win he loses fj. He has a system which gives a probability 0 of picking a winner, 
and he decides whether or not to bet on any particular race by the number of 
wins, x, recorded in the N previous races on which he has bet. If x > и, he will 
bet; if x < n, he will not. The problem is to decide on n. 

Assuming that the races can be treated as statistically independent events, the 
probability of exactly x wins is b(x, N, 0) so that the probability of betting is 
B(n, М, 0). If the bettor decides to bet, his expected loss per race is (1 — 0) — «0, 
which is positive for 0 < 09, where 0) = B/(« + В). 

If he decides not to bet at all, his gain will be zero, but he will lose what he 
might have won by betting if 0 > 0). The risk function is 
(6.15.1) r(n, 0) = [f(1 — 0) — 20]: B(n, N, 0), 0x0, 

r(n, 0) = [40 — В(1 — 0)]-[1 — B(n, N, 0)], 0 > 0, 
Using the normal approximation to the cumulative binomial, we have B = 
1 — Ф(2), where 
n — 1/2 — NO 
(6.15.2) z = Tyo] 
The Wald principle is to pick п so as to minimize the maximum value of r over 
all possible 0. This minimum occurs when the maximum of r for 0 < 4% is equal 
to the maximum of r for 0 > 0,. The actual calculation of these maxima can be 
done numerically with the help of good tables of the normal law (e.g., reference 
[8] of Chapter 3), and it turns out that the maxima occur at 0 = 6% + 0.752 
[0001 — 0,)/N]'7. The approximate solution of the problem is to take п as 
№, = NB(« + B) ', and if x > n to bet on the next race. 
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6.16 Game Theory and Statistics А subject which has come to the fore in 
recent years is the theory of games, which, besides its application to ordinary 
parlor games, has a very distinct relevance to economics and to military strategy. 
A. good many statistical problems can be thought of as games played against 
Nature, although Nature of course is not a malevolent opponent, out to make 
things as bad as possible for the statistician. It is hardly surprising that in these 
circumstances Wald's principle is often unduly pessimistic, and several other 
decision criteria have been suggested. References [8], [9] and [10] may be con- 
sulted for more details. 


PROBLEMS 
A. (88 6.1-6.5) 

_ 1. Prove that the maximum like! 
distribution is the sample mean m, 
sample size. 

2. Show that, for the Poisson distribution, m is a sufficient estimator for ш. Also 
show that condition (b) of Eq. (6.4.3) is satisfied. — 

3. For a normal population with mean p and variance о°, the mean and the median 
of a sample of N are both consistent estimators of н. For large N the variance of the 
median is approximately mo?/2N. Show that as an estimator of j the median is roughly 
64% efficient. 

4. Suppose that the mean p 
9? is to be estimated. Show th 


lihood estimator for the parameter и of a Poisson 
and that the variance of тт is u/N, where N is the 


of a normal population is known, but that the variance 
at the sample variance Аз, although unbiased, has an 


efficiency (N — 1)/N and is therefore only asymptotically most efficient, while the 
sample second moment about р is both unbiased and most efficient, for any М. | 

5. Show that for a binomial population with probability of success 0 in each trial, 
the maximum likelihood estimator of @ is the proportion of successes p in a sample. 
Show also that the variance of p, as given by Eq. (6.3.3), agrees with that previously 


found, namely, (1 — 6)/N. А 

6. А ае. Р variate has the density function Л (x) = x*-le-2/T(a), 
X > 0. Write down the equations for determining from а sample of size N the maximum 
likelihood estimator for « and its variance. (In order to solve these equations, tables 
of the digamma function d log Г(о)/4х and the trigamma function d? log Г(о)/ао 
must be used. (See Н. Т. Davis, Tables of the Higher Mathematical Functions, 


Bloomington, Indiana, 1933-5.) | . | 
7. xe that = unbiased estimator of « in Problem 6 is the arithmetic mean т 
of the sample. Show also that the efficiency of this estimator 15 (o d?[log T'(a)]/dx?)-1. 
is quantity tends to zero as о decreases to 0. The nearer « is to zero the more skew 


is the distribution. i ' 
8. The mean Med deviation for a sample of size N Sid innt ИЛЗ MURUS аз 
d = У — m|/N. For samples from а normal population of mean џ and variance 


9*, the variance of d is given by 


20%М№ — 1) (a + INN — 2° — N + sin? [UO — Dy 
aN? 


Compare the asymptotic efficiency of the quantity dV7/2 with that of the sample 
Standard asdation ji^ estimators of c. Hint: Prove that V(d) = o*(1 — 2/m)/N + 


O(1/N?), and note that V(s) = 0/27 + ON"). 
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9. Show that for the continuous distribution with density f(x) = 0е-0=,0 < x < œ, 
confidence limits for 0 for a large sample, with confidence coefficient 0.95, are given by 
(1 = 1.96/ V N)[m where m is the sample mean. Hint: For large N, the m.l. estimator is 
approximately normal. ‚ 

10. Show that for the rectangular population with density f(x) = (В — a)-!, 
0 <a <x < B, joint maximum likeiihood estimators for с and В are the smallest and 
largest members of the sample respectively. Hint: Show that these give the greatest 
possible value for L. 

1i. Suppose that the discrete variate X is binornially distributed except that it cannot 
take the value 0. The probability that X = x(x = 1,2...n) is given by f(x) = 
(ea — 8)n-*[1 — (1 — 6)"]-1. If the numbers of successes in N repetitions of the 
sequence of n trials are x1, x2, . . . хм, show that the maximum likelihood estimator б of 
0 is given by the solution of the equation пб = т — (1 — бул], where т is the arith- 
metic mean of the x:. Find 6 for the case n = 2. 

12. Obtain an equation for the maximum likelihood estimator of p derived from a 
sample of size N from the bivariate standard normal population with joint density 
function 


f(x, y) = Qm — p?)-v? exp( - Tan Sep qe 
2 1 —p? 


Show that the variance of this estimator is (1 — p*)?/[N(1 + р?)]. Hint: E(x?) = 
EQ?) = 1, E(xy) = p. 


B. ($8 6.6-6.12) 

1. The yield in bushels of a certain type and size of potato plot is found to be 
normally distributed with a standard deviation of 2.36. It is hoped that the application 
of a certain fertilizer will increase the yield by at least 0.5 bushel. How large a sample 
of plots should be used to detect a difference of this amount, using the mean sample 
yield as a criterion, with a test of size 57; and power 9075? 

2. An experimenter knows that a distribution is approximately normal with 
standard deviation 1.2. He wishes to test the hypothesis that the population mean is 
75 against the alternative hypothesis that м > 75, using a sample of size N and a test 
of size 1%. What test should he use? Calculate the power for N = 9 and for u = 75.5, 
76, 76.5. What size sample should he take if he wants to be 957; sure of detecting a 
difference as small as one unit from the assumed value 75, still using a test of size 197? 

3. A population has the Poisson distribution with parameter i, which may have 
the values 1 or 2 but no others. Find the likelihood-ratio test for testing Hi(that ш = 1) 
against И» (that ш = 2), using the mean of 10 observations of X asthecriterion. Assume 
that the probability of error of the first kind is not greater than 0.05, and calculate the 
power of the test. Hint: The distribution of the sum of N independent Poisson variates 
with parameter и is also Poisson with parameter Nu. Use a table of the cumulative 
Poisson function for numerical results. 

4. Develop the likelihood-ratio test for a binomial population, for testing the simple 
hypothesis 0 = бо against the simple alternative 0 = 0, (where 0; > 60), using as a 
criterion the number x of sticcesses in the first п trials. If the size of the test is approxi- 
mately « and the power approximately 1 — В, find a relation to determine и. Give 
a numerical result for бо = 0.5, 0; = 0.7, х = 0.05, В = 0.10. Hint: Use the normal 
approximation to the binomial. 

5. Find the likelihood-ratio test for testing the significance of the difference between 
the mean of a sample and an assumed population mean ро, the population being 
normal with unknown standard deviation e. Hint: Find the ratio of the maximum 
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likelihood over all c for a given po to that over all c and all р. The test is equivalent to 
Student's t-test, which will be discussed more fully in Chapter 8. 

6. Assume that the random variables Yi, i= 1, 2 ... N, are independent and 
normally distributed with means 7 = « + Bxi, and а common variance c?, for known 
values of xı. Write down the likelihood function for the set of Y; and hence obtain 
joint maximum likelihood estimators for the parameters о, В and о. (These estimators 
will be discussed more fully in Chapter 11.) 

7. It is desired to test the null hypothesis that a certain coin is fair (that is, that the 
probability @ of a head when the coin is tossed is 0.5) by counting the number of heads 
x in n tosses. Show that the likelihood-ratio test of Ho against the alternative hypo- 
thesis Mı (that 0 is either less than or greater than 0.5) is equivalent to rejecting Ho 
when |x — n/2| > k, where k is determined by the size of the test. 

If the test is to be of size 0.05 and power 0.9 to detect a difference of 0.02 in 0 from 
the assumed value 0.5, how large should n be? Hint: The likelihood-ratio test may be 
written: reject Ho when f(x) > с, where f(x) = x log(x/n) + (п — x)log(1 — x/n). 
Show that f (x) has a minimum at x — n/2 and is symmetrical about this value. For 
the second part of the question use the normal approximation to the binomial. 


C. (88 6.13-6.15) . А 
1. Carry ош the integration indicated in Eq. (6.14.3) and show that it reduces to 


the stated value. Hinr: Use Eq. (4.5.3) and express the gamma functions as factorials. 

2. A bag contains 10 balls, either black or white, but it is not known how many of 
each. A ball is drawn at random, looked at and replaced, and three times running the 
ball so selected is white. What is the probability that the bag contains at least five 
White balls? Hint: Use Bayes’ rule, with sums instead of integrals. Obtain a numerical 
result by assuming a constant value for the prior probability of 0 white balls (0 = 0, 1, 


2...10). 

3. Instead of the assumption at the end of Problem 2, suppose that the bag was 
filled by picking 10 bails at random from a very large number of black and white balls 
mixed in equal proportions. What is now the probability of at least five white balls, 
after seeing the three white balls drawn? Hint: The prior реа А binomial 

; i bservations is made on a variate X which Is normally 

4; Asetik 100 independent d E nce 25 units. The null hypothesis 


distri i and known varia 
buted with unknown mean p is that ш = 2 (these are the only 


Hoi e tive hypothesis Hı i: 
015 that р. 0, and the dierpat ҮР t Ho or Hi is made on the basis of the mean 


possibiliti isi hether to accep! н 
(т) an pee x. x о 1f Ho is true, the losses corresponding to these two 
decisions (do and di) are 0 and 25, respectively; if Hı is true the losses are 10 and 0, 
Tespectively. w^ 
Given Tat £ is the prior probability of Ho, show that, on the Bayes principle of 
minimizing the expected loss, the decision do should be taken if т < c where c = 1 + 
(1/8) Іов {5/21 — €)]}. Hint: The mean 7 is normally distributed with variance 
Use Bayes! rule to find the probability of Ho after the sample has been examined. 
5. In Problem 4 above, find the probability a(ë) of rejecting Ho if true and the proba- 
bility В(&) of accepti ; if false ; i ; 
pung Ho и 25. Compute this quantity for various 
T i + 10(1 — 98. Comp 1 
О orn ati ese 
а maximum T — i 
ү ; rinciple. With £ unknown, the 
6. S 4 above using the minimax рг ini ; 
maximum risk is 25 if Ho is true and 1086) nimi pape oes 
E at А fe) . , 
maximum risk occurs when 252(8) = Юа least favorable value of £.) 


à Н This is the | | 
а ал large lots is willing to accept а lot if the 


Proportion 0 of defective articles is less than £o, but will wish to reject it if Ө > ĝo. If 
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he accepts a lot, his loss Li(0) will be zero if < бо but positive if 0 > бо. If he rejects 
a lot his loss L»(0) will be zero if 0 > ĝo but positive if 0 < ĝo. He bases his decision on 
the number of defectives r in a sample of N (assumed binomial). What will be his 
decision rule on the Bayes principle if the prior probability of 0 is £(0)? Show that this is 
equivalent to the rule: accept the lot if r < c, where c is some fixed number. Hint: He 
will accept with r defectives if the expected loss in accepting is less than the expected 
loss in rejecting. Show that if this is true for r — c it is also true for r — c — 1 and so 
for all r < c. 
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Chapter 7 
SOME SAMPLING PROCEDURES 


7.1 Random and Less Random Samples Sampling is undertaken in order to 
find out something about a population without having to examine every ‘item 
in it. By ‘ta population,” we mean any collection (usually large) of elements 
such as people, pigs, farms, coin-tosses, incomes, or whatever it may be, about 
which we want some information. Often an important decision must be made on 
the basis of knowledge obtained from the sample, so that it is useful to be able to 
estimate how reliable this knowledge actually is. Sampling theory is concerned 
with ways of estimating, and perhaps improving, the precision of the information 
obtainable from a sample about a population. 

Any procedure for making such an estimate must be based on the theory of 
probability. That is, it must suppose that the sample is random. Most of the 
theory of estimation, hypothesis-testing and decision-making that we have been 

based on the concept of a random sample. 


considering in the last two chapters is А A SER 
A sample of given size is said to be random if every possible sample of this size 


in the population (supposedly finite) has a calculable probability of being 
chosen, but this probability need not be the same for all items. If, however, the 
sample (of size N) is selected in such a way that every combination of N elements 
in the population has an equal probability of being chosen, the process is called 
simple random sampling. This is the usual assumption 1n theoretical statistics, 
although in actual sample surveys simple random sampling is rarely used. For 
reasons of cost and administrative convenience, as well as in order to improve 
Precision, some modification of the simple random design is generally adopted. 
For a sample of М from а finite population of size M the probability that 
any individual item will be drawn is N/M. If the population is infinite, 
this probability is zero, but it still makes sense in many cases to assume that 
One item is as likely to be drawn as another (see $ 5.1). When a number of 
tosses, made with a particular coin, is considered as a sample of the practically 
infinite number of tosses that might conceivably be made with this same coin, the 
Sample is obviously not random in the strict sense, since it ex of the first N 
items of the populatien in order of time. However, we in E physical 
assumption that the order in time is quite irrelevant as far as the characteristic 
of any toss (heads or tails) is concerned, so that the first N tosses form effectively 
а random sample. " К ; Я 
"e on d le, it may be possible to use stratified 
If some prior information 1$ аа МУ йош sampling, ПИЕ Pro- 


Sampling and so gain in precision о vem к 
cedure, the population is divided into groups. the elements within a group being 
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more alike than those in the population as a whole. If a simple random sample 
is drawn from each group, we still have a probability sampling procedure, but 
we have insured that each group is represented in the total sample. The groups 
are called strata, and the process of dividing the population into groups is called 
stratification. This procedure normally reduces the sampling variance of the 
variate measured. It is particularly effective when there are extreme values in 
the population—stratification with regard to income levels, for example, is a 
common practice. The costs of sampling may differ considerably from one 
stratum to another (as between urban and rural households, for instance) and 
these costs may be important in setting up the strata. 

The people who conduct sample surveys are usually much concerned with 
questions of cost. , They want the maximum precision per dollar spent, and 
therefore tend to favor cluster sampling. This is a method of reducing costs by 
first taking a random sample of groups or clusters and then taking sub-samples 
from the clusters selected. To take a sample of 3000 households from the popu- 
lation of the United States, we might first draw a sample of, say, 50 counties 
and then sample these proportionately to their total populations. A simple 
random sample would probably be spread over many more than 50 counties, and 
would need much more travel and supervision. 

Cluster sampling may not be very efficient as far as precision of the estimate is 
concerned. The best results occur when the clusters each contain very diversified 
elements—just the opposite from the requirements for stratified sampling. 

Another common procedure in some types of sampling surveys is systematic 
sampling. To draw 500 cards from a file containing 10,000 cards, we could 
select a random number between 1 and 20 (say 13) and then take every 20th 
card, beginning with the 13th. That is, we could pick the cards numbered 13, 
33, 53, and so on. If the order of the cards has nothing whatever to do with the 
variate for which we are sampling, this gives us effectively a random sample and 
itiseasy to apply. To sample housing units in a city one might, for instance, take 
every 12th block and every fifth house in the block, but it would be well to make 
sure that the procedure adopted did not lead to picking out an undue proportion 
of corner houses—at least if the object of the survey has any connection with 
economic status. Corner houses often pay higher taxes and are generally 
occupied by people with higher incomes than non-corner houses. 

A. method of selecting a sample often employed in public-opinion polls is 
that of quota sampling, in which an interviewer is instructed to fill a specified 
quota by finding as best he can persons satisfying certain restrictions—he may 
be asked to contact a specified number of persons of a particular sex, age-group, 
and income-group, for example, but no attempt is made to make the sample 
random. This method is apt to introduce a completely unknown bias into the 
estimates made. 

Any method of sampling which is not random, but tries to pick out a typical 
or representative sample, may be called purposive sampling. This may be useful 
if only a very small sample can be taken, and if the person picking the sample has 
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Bood judgment and expert knowledge, but there is no statistical theory availabie 
for measuring the reliability of the results obtained. Sometimes, of course, a 
random sample is from the nature of things practically unobtainable—a sample 
of fish from the sea, for instance—and we are forced to use any kind of sample 
we can get. Nevertheless a probability sample should be obtained whenever 
Possible, and only then is the theory of sampling strictly applicable. For a fuller 
discussion of sampling procedures see [1] and [2]. 


* 7.2 Stratified Sampling Suppose the population of size M is divided into 
Strata of sizes M,, М»... M,, and a simple random sample of size N; is taken 
from the i" stratum. If X; is the measured characteristic for the a'^ item in the 
i" stratum, the i" stratum mean for the population is 


1 Mi 
(7.2.1) nes Xe У, ММ, 
a-i 


and the over-all mean for the population is 

1 k 
7:2, =— Y Mi 
(7.2.2) p= à H 


The estimator of д, based on the stratified sample, is X, where 


1 Ni 
3 d Y =. 
(7.2.3) X-u УМХ, Х= N, PC 
;th 4 
and i the "^ item in the sample from thei ' stratum. Note 
nee Atar 7. le stratum means, with weights depending 


that Y is a weighted mean of the samp! Р 
On the sizes of the strata in the population, so that these sizes must be known 


fairly accurately before the method can be used. 
From Eq. (ТЗ) E(X) = рь and therefore, by Eq. (3) above, 


(7.2.4) ЕХ) = x Y MEX) = и 


by Eq. (2). That is, X is an unbiased estimator of jt We will now show that the 


Variance of X is given by 


1 МКМ, = №) | 
025) К-т X 


ation variance in the 28 


bu ej —(M,— ан E (Xia- u)’, the popul 
Stratum, = 

Since X is a linear combination о 
the strata are sampled independently, 


ИХ) = У m va 


Geer 


f the X; with coefficients M;/M, and since 
we can use Bienaymé's Theorem ($ 2.14). 
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by Eq. (5.11.7), where x5; = c?. If f; is the ашлы fraction N,/M,, 


(7.2.6) ИХ) = ani Га fa? 


which is equivalent to Eq. (5). 
Since k, [= (N; — 1)! ) (X; — Xj] is an unbiased estimator of x5, 
we may estimate the variance of X by means of 


=a) у р a- ӘК» 


(7.2.7) ‚ Р(Х) = 


and k3; is the sample variance of the iin from the і" stratum. 
If f; isthe same for each stratum, and equal to f, say, the sampling is said to be 
proportionate (the N; are proportional to the M;). Then 


М; 
(7.2.8) ИХ) = X X T сг 
1 = 
"мү Мей 
since N = f M. 


If X and Y are two variates, both measured for each item in the sample, the 
covariance of X and Y is similarly given si 


(7.2.9) С(Х, Y) = - е ^a — fini 


where л; = (M; — 1) У, (Xia — (Ты — у), the population covariance in 
the i" stratum, у; being the stratum mean for Y. As before, л; may be estimated 
from the sample covariance in this stratum. 

The gain in precision due to using proportionate sampling, compared with 
simple random sampling from the whole population, may be found by com- 
paring Eq. (7.2.8) with the expression for the variance of the mean of a random 
sample, namely, 


(7.2.10) V(X) = o?(1 —f)/N 


where о? = (M — 1)! Y, (X, — uy^, and « takes any value from 1 to M. 
Therefore, 


(7.2.11) V(X) — V(X) = 2 (7 - E X мо?) 


In practice, the M; are usually so large that the distinction between M, and 
M, — lis unimportant, and we can put 


M Mi 
Mo? = У. — uy, Мо? = iQ. — uy 
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Since 
(X, = ид? =(X, — Н)? + (и — Ш)? + 2(и — щ)(Х„— p) 
we have 
Mg? x Y Qt, — 0 + Ми — n +2МКи — Ви — 1) 
ar 
=} (X: = uy — Mu — ш) 
1 
50 that 
> „2? 1 =f. >. i 
(7.2.12) V(X) – V(X) а pi, Miu — ш) 


and this expression is never negative. The gain from proportionate sampling 
is greater, the greater the differences between the stratum means. 

The question arises as to the optimum choice of the N;. It was proved by 
Neyman [3] that for a fixed N the variance of X is least when N; is proportional 


to Мо, That is, we should choose N; so that 


(7.2.13) NN = Misi 3 (Мо). 
es that some information, from a preliminary survey 
; Н i i t the сү. 
Or from previous experience, 1$ available about i 
If ihe cost c aru of sampling also varies from one stratum to another, 
and if the total со с of the whole survey is fixed, it may be shown that the 
Optimum sampling number for the i stratum is proportional to Mjc;/ci ^. 


This, of course, suppos 


cluster sampling, the elements of the 


Population are grouped in clusters which themselves are the primary sampling 
Units. In a one-stage plan all the elements in the selected clusters (picked by 
Simple random sampling) are included in the sample. In a Vs piter pen a 
Tandom sub-sample is selected from each ы sampling unit, and of course 
furt ; an be introduced. . 7 " " 
ds е of d тана are k clusters in the population, with sizes 
Mii = T а d 1 of these are selected in the first stage. From the J 
Selected болат (j= 1,2. ..D, the mumber of SOON ase 
T = 1,2... js Ж 
n is Ei diagrammatically in к of Y dos vw 
TOm s Il, of the clusters. мВ; i jth 
item ine duster and Xj, the value for the л" item picked from the j 
Selected cluster (x = 1, 2... Ms h-5 
With a notation similar to that used before, im 
My =}, Ха = s 
a 


X-Y NX 

NX,- Y Xn NX У ^1 
=У М 
M=} Mi Nei, 


* 7.3 Cluster Sampling In simple 


j^ 
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We take as the estimator of и the quantity 


(7.3.1) T- veer. 
MIG 
and it can be shown that 
(7.3.2) E(X) = џ 
апа 
(7.3.3) V(X) = wail =M +) ммк «i 
where 
(7.3.4) T = (k — 07 È (Miu — Mu 


and c? has the same meaning as in Eq. (7.2.5). 
- k - 
From Eq. (1), E(X) = mi i E(M;X;). Now the actual value of M,X, 


depends upon two random events, the 
selection of the /"" cluster and the selec- 
tion of the М, items from this cluster. 
It is shown in Appendix A.14 that if 
X is a random variable depending on 
Y which is itself a random event, then 
E(X) = E[E(X| Y). Here the event Y 
is the choice of the j'" cluster. Given 
this choice, the expectation of M,X, is 
Ми where и; is the mean for the 


J 
whole cluster. Therefore 
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- 1 
(7.3.5) E(M;X) = E(M u) = т Y Mu, 


since this cluster is one of А clusters, all with an equal chance of being picked. It 
follows that 


si. 
=—yyM 
(7.3.6) E(X) = үгү}. y Мич 
1 
=— Мр = 
м pH 


To find the variance of X we need Theorem 2 of Appendix A.14, according 
to which V(X) = E[V(X| Y)] + VLE(X| Y), where X is replaced by X and Y 
means the choice of a particular set of / clusters from the set of all k clusters in 
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the population. When this set is fixed, the problem becomes one of stratified 
sampling with / strata. The variance of the ЛВ mean X; is, by Eq. (5.11.7) 
“т> 1 1 
V(X) = ^(s- xr) 
dini N; Mj 
where o? = (M; — 1) У. (Xj — L}, so that 
= К? 1 1 
V(X|Y) ==> Мл Gaz) 
GIO = gy 2, М7) 7 м; 
The expectation of this for any choice of the / clusters is 


(7.3.1) EVAN = Moli - уг) 
t i i 


which is the second term in Eq. (3). The first term represents the part of the 
variance due to first-stage sampling. Since E(M ХДУ) = My; 


- k 
(7.3.8) E(X|Y) = мі М = yg Mn 


By Eq. (5.11.7), 
ИМ) = к; 0/1 —4/k) 


where 
к; = (Ми — Mul ИК — 1), 
$0 that 
= k?( 1 1 ми? 
(7.3.9) V[E(X|Y)] = xs = Jc i (ма = a) : 


This gives the first term in Eq. (3). This term is small when the clusters are very 
much alike in size and composition. 


* 7.4 Systematic Sampling Suppose the population consists of the elements 
E,, Е... Em arranged in some fixed order. Any systematic sample consists 
of the elements E; Ек+ь Z2x+i--- Ек-пк+ь where i is one of the numbers 
1,2... К, апа Nk < M. Usually, fora sample of size N, К is chosen so that Nk 
is as near to M as possible. 
Systematic sampling divides 
consisting of k successive units, an 
choice is not random, however, since 
position in each stratum. Since a systema 
population, it often gives a good estimate о 
the estimator is in most cases greater than t 


the population in effect into strata, each 
d chooses one sampling unit per stratum. The 
the unit chosen occupies the same ordinal 
tic sample is spread evenly over the 
f the mean, although the variance of 
hat for a simple random sample. 


* 
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If X, is the variate measured on the а" element the sample mean for the j'* 
systematic sample is 
Xi Xii... + Хм уы 
N 


(7.4.1) т; 


Since there are k values of i, all equally likely, 


(7.4.2) Eni) y i. d Ух 
id "OX Aem 


If Nk = M, this expression is the population mean д, so that m,is an unbiased 
estimator for и. Also, 


(7.4.3) V(m) = E(m, — и)? 
1 
=p m- = тар 


Now by Eq. (1), Nm; — Nu = (X; — и) + (Xai — B) +... + (Хин — 2) 
50 that 


2: (Nm; — Ми)? 


k(N— 


k Nk N-1k(N-j) 
(7.4.4) A (Nm; — №)? = iQ. = uy +2 x д, (Xp — W(Xp+ Ш) 
= а= ј= = 


The first term on the right-hand side is (М — 1)s?. The second term vanishes 
if there is no correlation between pairs such as X, and Хр+ д, separated by jk 
items. Correlation of this type is called serial correlation. 

If the items are serially uncorrelated, 
(М – 1)02 M-1c? 

N?k M N 


(7.4.5) V(m;) 
The corresponding value for the variance of the mean m, of a random sample of 
size N is (N^! — M^ ')c?, so that 

Vim) M-1! 

Ит) M-N 


This is greater than 1 for any М > 1. However, if there is a sufficiently large 
negative serial correlation, V(m;) may be less than V(m,). See [4]. 


* 7.5 Double Sampling It is sometimes useful to take a preliminary large 
sample in order to get some information which will serve as a basis for drawing a 
sub-sample for further investigation, particularly if the large sample can be 
obtained rather cheaply. The information is used for stratification, or in other 
ways, in order to increase the precision of estimates from the smaller and more 
costly sub-sample. 

For instance, suppose we are concerned with a variable T, such as total sales 
in all retail stores of a certain type, which may be rather hard to obtain. The 
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large stores of this type will be much more important in providing an estimate of 
T than the small ones, and we would like to have a much larger sampling fraction 
of the larger stores. As a preliminary to the sampling design we might make a 
survey of a large fraction of the population of stores, obtaining only simple 
information on size (say the number of employees), and use this to decide on the 
sub-sample which will be investigated to determine Т. 

Suppose we classify the stores in the original sample of N (from a population 
of М) аз large (N,) or small (№). The corresponding numbers in the population 
are M, and M;. If the sub-sample consists of all the large stores and n, of the 
small ones (sampling fraction n;/N; = 1/k), an unbiased estimator of T is 


(7.5.1) T =; (Т, + kT) 


‚ Where T, is the total sales for the N, large stores and Т, is the total for the n; 


small stores. Here f is the primary sampling fraction N/M. The total sub- 
sample size is №, + п. Equation (1) may be written 


M Ni М, na 1 
«9. =— pd x 
(7.5.2) x55; РУ zi) 


where Xj; is the sales figure for the i™ store in the large group and X;; that for 


the j" store in the sub-sample of the small group. 
The expectation of f for a fixed nz, and for a fixed set of N units from which 


the sub-sample of size n, is picked, is given by 
M[M N2 
(7.5.3) E(f|n, №) = ET + S xa 


N 
Ети 
N i51 


Where X; is now the value for the i store in the sample, regardless of whether 
it belongs to the one group or the other. Then 


(7.5.4) E(T) = ЕЕ Ти», N2)] 
„Ўст 
і=1 


The variance of f, as shown below, is given by 

M 2 " 
(7.5.5) (Ту = ЦМ 0 + (k — 1)M20:°] 
where 


м T 
о? -(M-1"YX0G-m. а= у 
i=1 
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and 
2 ES. 2 № X 
05" —(M;— 1) YQ - uy, = У = 
ї=1 i=1 M; 


To prove this we need Eq. (A.14.6) of the Appendix, namely, 
V(X) = E[V(X|Y)] + V[EGX| Y] 


where Y stands for a fixed set of М» small stores and the fixed number nı. The 
second term on the right is just the variance of the right-hand side of Eq. (3), 


- € 1 1 T 
that is of MX. This is m} - x)^: which is the first term of Eq. (5). 


The conditional variance of f'is 


M'NS/(1 1 
(7.5.6) V(f|n;, N2) = Ex = A 
= 7 (k — 1)s,? 


where 52? = (№, — 1)! Y2, (X2; — X;)*. This follows from Eq. (2), since the 
first term is constant (under the stated condition) and the second term is 
(MN,)/N times the sub-sample mean X;. 

The expectation of Eq. (6) for a given n; and given К (i.e., for given number 
№ of small units although not for a fixed set of N3) is (N;/f2)(k — 1)с,?, and 
the expectation of this for given k is 
F 027E(N2) = = сз? тз 


м 
=(k- Пт, Мус? 


which is the second term of Eq. (5). 

The optimum allocation of sample sizes will depend on the relative costs of 
the first and second sample. These often differ quite considerably. The original 
large sample may, for instance, be obtained by a mailed questionnaire, and a 
sub-sample of the non-responders may be followed up with personal interviews, 
which are considerably more expensive. As a cost function (apart from fixed 
overhead costs) for the whole survey, we might assume 


(7.5.7) C = NC, + NC, + nC; 


where Со is the cost of selecting and examining a unit in the large sample, C, 
is the additional unit cost for the large units, and C; is the additional unit cost 
for sub-sampling the small units. We want to find the optimum values of N and 
К (= №/пз) for fixed variance and minimum cost. From Eq. (7), 

N 1 NM; 


мМ: t 


We wish to minimize this, subject to a fixed value, say е2, for the variance of the 


(7.5.8) Е(С) = NC, + C; 


5 


= 
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estimate T. The method is to use a Lagrange undetermined multiplier A (see 
Appendix A.15) and form the function 

(7.5.9) F(N,'k, А) = Е(С) + ДУС) — 2] 

Setting 2Е/2№ and dF/dk equal to zero, we get 

(7.5.10) C, m 6, tm С, + |15 = ммк уз” =0 


апа 


Eliminating 4/N? from these equations, we get 
_ C,Mo? – CM0? _ МС» 021032 — ММ 


7.5.1 2 = 
; 1) и CoMo;? + С.Мџс2 М, Ci + CoM/M, 
If we put ИСТ) = =? in Eq. (5), we get an expression for N, namely, 
Mc? Maer) 
(7.5.12) N= "ln 4(k-— 1) Mo? 


where k is given by Eq. (11). 
If we took a simple random sampl: 
enough to give the same variance for T 


e of N' from the population of M, large 
(now MX), we should have 


2 x 2 -—— ® == 
г? = V(MX) = М*И(Х) = TUN OM 


so that 
М?о? 
(7.5.13) № = Mat qe 


n for N, Eq. (12). However, although N 


This is the first factor in the expressio: | 
le sample is less than that of the propor- 


is larger than №”, the cost of the doub! 
tionate single sample. 


ExAMPLE 1 Suppose that 
м =20,000, M, = 1,000, (Mz = 19,000) 
сү? = 500, 622 = 5, а? = 34 
(the variances are estimates from a preliminary investigation), 
C,-025, C,=2, C.-1 (dollars) 


and 


Suppose the preliminary estimate of T is 29,000 units, and we want = to be not 
more than 0.04 of this, or 1160. Then by Eq. (13), 
N’ = 6710 
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The cost of a single proportionate sample of this size would be 6710С + 335C, 
+ 6375C, = $8723. 

From Eqs. (11) and (12) we find k? = 16.7, so, that k = 4, and М = 9520. 
This gives №; = 476, п, = (9520 — 476) = 2261. The cost of the double 
sampling method would therefore be 9520C, + 476C, + 2261C, = $5593. 

This is considerably cheaper than the cost of a single sample to give the same 
precision. 


7.6 Sequential Sampling In any fixed-size sampling procedure the total 
number of items in the sample is decided beforehand and this number of items 
is drawn and examined. However, it is sometimes practicable and economical 
to draw the sample items one at a time and examine them as they are drawn. 
This type of sampling is called sequential. 

Suppose a certain hypothesis Ну regarding the parent population is to be 
tested (for instance, the hypothesis that the proportion of defectives in a large 
batch of machine parts is not greater than ро). On the basis of the first m sample 
items tested we may make one of three decisions: (а) to accept Ho, (b) to reject 
Но, (c) to test one more item. The process is terminated when our decision rule 
leads us to either (a) or (b). The expected number of observations required to 
reach one of these two decisions is less than we would need in order to make the 
same decision on the basis of a single fixed-size sample. Of course, it may 
happen that the sequential procedure will take more observations than the 
fixed-size one (although this is unlikely) and it may not always be convenient in 
practice to take the samples one at a time, but, by and large, sequential sampling 
is a definitely economical procedure. For full details, Wald's book [5] should be 
consulted. 

Sequential testing may be illustrated by the theory of the random walk. 
Suppose В, О, A are three points on a straight line, where А is a paces to the 
right of O and B is b paces to the left. If I start at O and take one pace per 
second in a random direction (backwards or forwards) along the line, how long 
will it take me to reach either 4 or B? This is the random walk problem in a 
simple form. It can be proved that the walk will eventually terminate. The 
probability of oscillating back and forth without ever reaching either А or B is 
zero. In the sequential decision process each new item tested is like a pace in the 
random walk—it leads towards decision (a) or decision (b). Eventually one of 
these two decisions will be actually reached. 

The type of sequential test suggested by Wald is a likelihood ratio test. Let us 
suppose that we are testing a simple null hypothesis Hy against a simple alternative 
H, (see $6.7). Let fo(x,) be the probability (or probability density) that the 
variable X takes the value x, when Hp is true, and similarly fi(x1) = P(X = 
x,|H,) The joint likelihood of the given set of m observations X, х2,... Хи 
under Но is 


Рот = fo(x1) fo(x2) - - Роб) 
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and the joint likelihood under Н, is 
Pim = f(x) fix) e Лы) 


The test suggested is to calculate р; „/Роп for each successive value of m and to 

continue testing as long as the ratio lies between two specified limits 4 and B 

(A > B). The process is terminated when for the first time either р;„/роњ > A OT 

PimlPom < B. In the first case Но is rejected, and in the second case it is accepted. 
If we put z; = log f(x) — log fo(x;), we have 


7.6.1 Pun) - 5 [287] - Xa-7. 

| ) loa oe à log Р(х) 2. 

and the test is terminated as soon as Z,, > log A or Zm < log B. Since Z, is a 
sum of the m random variables z; the analogy with a random walk is clear. 

The values of A and В are determined by the risks we are prepared to take in 
coming to the one decision or the other. If the probability of a rejection error 
(an error of the first kind) is æ and the probability of an acceptance error (second 
kind) is f, and if n observations lead to the rejection of Ho, then pi,/Po, = 
(1 — В)/о, since this is the ratio of the probabilities of H, and Но for a sample 
Which leads to the rejection of Ho. Therefore, 


(7.6.2) : 5 у 
Similarly, 
(7.6.3) i 2 É <В 


In practice we usually take A = (1 — В)/х and В = B/(1 — a). If the distribu- 
tion is such that one extra observation will make little difference to the value of 


Pil pos, there will be no appreciable error in doing this. 


* 7.7 Number of Observations Required for a Final Decision in Sequential 
Sampling Let п be the smallest integer for which Z, > log A or Z, < log В. 
We would like to find the expected value of и and compare it with the fixed 
sample size N which would give the same probabilities « and В of error. 


Since Z, = z, + 22 +... + 2» and n is a random variable, 
(7.7.1) E(Z,) = E[E(Z,\")] = Е[пЕ(т)] 
= E(n): E(z) 


Where E(z) is the expected value of any of the 2;. 
If the test leads to the rejection of Ho, E(Z,) will be nearly log A, and, if the 


test leads to the acceptance of Ho, E(Z,) will be nearly log B, so that 


(7.7.2) E(Z,) = y log A + (1 — yog В 
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where y is the probability of rejecting Но (у will be о if Но is true or 1 — £ if 
H, is true). From Eqs. (1) and (2), 


ylog А + (1 — y)log B 
E(z) 


EXAMPLE2 Supposethe variate X is normally distributed with unit variance, 
and that under Но the mean is до and under Н, it is и: (> до). Then 


feed == (Qua Per oom 
Sil) = (2л)- 1/267 (x-i?/2 


Let E,(n), E,(n) be the expected values of n under Но and H, respectively. 
Then 


(7.7.3) E(n) z 


(7.7.4) 


a log[(1 — 8)/«] + (1 — o)log[B/1 — «)] 


DAT EG) 
бита 
_ (= вова — Bf] + B lost B/(t — 50] 
ый E.G) 


From Eq. (4), 
z = log f,(x) — log fo(x) 
2. ud 
- -Ar + х(и: — Ho) 


Under Но, E(x) = Ho, so that 


ш?- Ho” 
(7.7.6) Боа F Holki — Ho) 
= — Ки: — Ho)? 

and similarly, E,(z) = (ш — Ho)’. Therefore, from Eq. (5), Ео(и) and £,(n) 
may be found. 

Now if N is the fixed sample size corresponding to the same values of « and 
В, and we use the statistic X (which is normally distributed with mean до and 
variance 1/N when Но is true), 


a= (any | ет dt, А = УМ po) 


50 
where с is the critical value for X. Similarly, 


© 


1-В= ont” | e? dt, А =VN(e- m) 


А1 
Therefore « = 1 — Ф(40), В = Ф(4,) and 
(Ag — A)? 


М = 
eum) (is — до)? 
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Ti P 
d. = О» T and (7) it appears that Ep(n)|N and E,(n)/N are independent 
jj and so may b i 
f= б, iie E y be add for any given and В. Thus fora = 0.05, 
Jp = 1.645, А = -1.282, %- Ay = 8.57, 
E,(n) _ 0.05 log 18 + 0.95 log 0.1053 
N — 8.57) 
= 0.465 
En) _ 091018 + 0.1 10g 0-1053 
No 4(8.57) 
ыз = 0.555 
и is an expected saving of 53.5% И Но 
fa 2 of observations required. 
pisces ower limit may be calculated for the pro 
Proba > will terminate before m reaches some preas 
ility is Ро(ло) under Но and Py (Mo) under Н}, 


Ро(по) > Ф(бо), Р(по) > 1- 02 


is true or 44.5% if Н, is true, in the 


bability that the sequential 
signed number ло. If this 
then it may be shown that 


Where 
log А- поЕ, (2) 


Тыш; 


s of 2 under hypotheses Ho and Н}. 


— po since the standard deviation 


. log B — noEo(2) 
5, 
and Упово(2) 
с 
fi в (=) are the standard deviation 
or 1". Erample 2 above, olz) = 0) = ^ 
15 1. Therefore, 


п 
log 0.1053 + A (и, — Ho)” 
ô = == 
° nets — Ho) 
n 
log 18 — т (ш — Ho)” 


9: = — 
| J/ пош — ро) 
sample size of 1000, we could detect 


Wi 
a Tre а = 0.05 and В = 0.1 anda e 
ence ц, — ; 2,927/V 1000 = 0.0926 (see §6.10). 
1 — Mp amounting to [ з ө MR 


Wi 
ith ло = 1000, à, = 0.694, д: = — 0.476, so that 
fore at least 0.68 that a sequential test 
ion of Но before 


Рио) > 0 
this kj .683. The probability is there 
ind will terminate in a decision for acceptance OF reject 


€ sa 
таре size 
reaches 1000. 
hat the test is still not 


f it happens t е 1 
t to continue testing, а 


7.8 
The Truncated Sequential Test I А 
опуепіеп 


tmi 
Shari for some ny beyond which it is not С 
2 le decision rule is the following: 
т < 0, accept Но; И = 0, acce 


te 
т 


pt Hi- 
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The probabilities of the two kinds of error, (по) and В(по) under this rule 
are slightly different from « and f. It may be shown that 
(7.8.1) a(ng) € а — Ф(у,) + Ф(7,) 
В(по) < B + (v4) — d(v4) 
where 
= log B 
Упосо2) 
Іор А 
Упосо(2) 
Іов 4 JI 
—— = —4ngE,(z)/o,(z 
TET oE1(z)/o (z) 
log B 
У пов: (2) 


= — УпоЕо(2)/во(2) 


у = 
у; = у + 
ы = Oy = 
V4 = V3 


These are upper bounds and probably higher than necessary. For ny = 1000 
and the data of Example 2, v, = —v3 = 1.464, у, = 2.451, » = —2.233. 
These values give (то) < 0.114, (л) x 0.159. 

If N = 100, and we decide to stop at ng = 300, whatever happens, the 
upper bounds for (то) and (по) are 0.052 when х = fi = 0.05, so that trun- 
cating in this way would make very little difference to the probabilities of error. 


7.9 The Sequential Test for a Binomial Distribution We assume that the 
objects in a large group (a “lot” in the language of sampling) can be classified 
as either “defective” or "satisfactory". A lot will be acceptable if the proportion 
of defectives 0 < 6’, but otherwise the buyer will want to refuse it. It is supposed 
that the buyer has to make a decision on the basis of a sample, and therefore he 
may make either of the two kinds of error we have discussed previously. He may 
refuse a good lot or accept a bad one, and must decide what risks he is prepared 
to run of making either of these mistakes. Suppose he decides that it would be a 
serious matter to refuse a lot with 0 < 0, and that it would be unfortunate to 
accept a lot with 0 > 0,, where of course 0, < 0' < 0,. He will want to keep 
the probability of committing these serious errors down below say а and В, 
respectively, where both these numbers are fairly small compared with 1. Having 
decided on 4, 0;, х and В, he can construct a sequential test. 

Randomly selected items from the lot are taken one at a time and inspected. 
Suppose the number of defectives in the first m units tested is d,. Then 

Pim _ 01" — 0,)" 7^ 


ГЭЛ P WU 
m Pos BFC — буу" 


will give a likelihood ratio test for hypothesis Но (that Ө = 09) against hypothesis 
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H, (that 0 = 0,). If 0 < 6, the probability of rejecting the lot will be even less 
than for 0 = 09, and if 0 > 0, the probability of accepting the lot will be less than 
for 0 = Өү. The same test may therefore be regarded аз a test for the composite 
hypothesis 0 < 6, against the composite hypothesis 0 > 04. 
е 1 
The test consists in rejecting Но if Zm = log (=) > 105 


От. 


= B accepting 


Ho if Z,, < log and continuing the test if neither of these is true. This is 


l-g 
equivalent to setting up an acceptance number Am and a rejection number Rm 
for each value of m and continuing the test as long as А, < dm < Rm The 


numbers 4,, and Rm are given by 


1-4 
1 
"Per das 1—6, 


log 


79. = 
озш) Am 1— 6; 


1—8, 


log 5 + log 
9 


1-8 1— 60 
log = + Лов 0 


7:9. у 
is Ы pg gg 
ЕГА eT =; 


Since 4,, and R,, depend linearly on т, they define a sloping band of constant 
width on a diagram with d,, plotted against т. Рога = В = 0.05, 00 = 0.001 
and 0, = 0.03, the lines representing А» and Rm as functions of m are 


Am = 0.00859m — 0.858 
К,„ = 0.00859т + 0.858 


(7.9.4) 


100 132 152 200 300 
89 212 
т— 


Fic. 38 SEQUENTIAL BINOMIAL TEST 


168 INTRODUCTION TO STATISTICAL INFERENCE 7.10 


In Figure 38 an imaginary sampling experiment is represented by the stepped 
line. Ín the first 81 items tested there were no defectives, the 82nd was defective, 
the second defective turned up at the 152nd test, the third at the 221st. This last 
test took the cumulative polygon outside the rejection line and therefore the lot 
was refused. 

A lot under this scheme will be accepted if the first 100 items tested show no 
defectives and will be refused if a defective appears in the first 17. If only one 
defective has appeared in 212, the lot will be accepted; if two appear in the first 
132 it will be refused ; and so on. 

The probability Р, of accepting the lot for any given 0 can be expressed as а 
function of 0. It decreases from 1, when 0 = 0, to 0 when 0 = 1. When 0 = 00, 
P, = 1 — а and when 0 = 0,, Р, = В. If б, and 0, are not too far apart, the 
approximate value of Р, is given by 


(7.9.5) P, = (4^ — 1)/(A" — B") 
where А = (1 — f)/a, B = В/(1 — x) and A is the non-zero root of 
6,\" 1—6,\" 
9. = = 0)1—| = 
(7.9.6) 0(3) +(1 (3) 1 
that is, of 


ix ( 1- zy 
1 — 6, 
6,\" 1-6,\" 

(2) Е ( - à) 

By choosing various values of h, we can calculate corresponding values of 
0 and P, and plot the curve. This is a sort of operating characteristic or power 
curve of the test. It indicates the probability of accepting a lot with any given 
proportion of defectives. With the data assumed above there is an even chance 
of accepting a lot with 0 — 0.009. 


The expected number of observations п necessary to reach a final decision, 
one way or the other, is given approximately by 


P, log В + (1 — P,)log A 


0, (; — o) 
=] — (1 — 0)10р|— 
өтов(с*) ( )log = 
As a function of 0, this starts at 100 for 0 = 0, rises slightly and then decreases 
as 0 increases, becoming 1 for 0 = 1. For 0 = 0.02, E(n) = 53 and for = 0.03, 
E(n) = 36 (using the data of Figure 38). The function is indeterminate at 
P, = 0.5 (0 = 0.0086). 


(7.9.7) p= 


(7.9.8) Е(п) = 


7.10 Tolerance Limits Tolerance limits are limits within which we are 
confident that at least a specified proportion of the population will lie (with of 


7.10 SOME SAMPLING PROCEDURES 169 


course a specified degree of confidence). We may for instance claim with 95% 
confidence that at least 90% of a particular population will have values of X 
between given limits. If these limits are the smallest and the greatest values 
Observed in a sample of N, we may ask how large N should be to justify the claim. 

If x, and xy are the least and greatest values of X for the sample, and if f(x) 
is the density function for X, the proportion of the population lying between 
x; and xy is 


(7.10.1) v=| f(x)dx 


The density function for v is 
(7.10.2) g(v) = ММ — 1)0*7201- 0), 0<0<1 


The probability that v > £ is therefore 


[o dv 
B 


and for a confidence coefficient of 1 — « this probability is 1 — а. Integrating 
Eq. (2) we obtain 


(7.10.3) a = NBN! — (N — Df" 


from which N can be obtained for given х and В. Рога = 0.05 and f — 0.99, 
we find N — 473. This means that if we take a random sample of size 473 from a 
Population in which X is distributed continuously, there is a probability 0.95 
that at least 99 % of the population will have X values between the least and the 
greatest values found in the sample. This result is independent of the form of the 


distribution. 


PROBLEMS 
А. ($ 7.2) : | 
1. Households in а town are stratified into a high-rent stratum (4,000 items) and 
а low-rent stratum (20,000 items). The variate X, of which the average is to be estimated, 
is thought to have a standard deviation in the first stratum about three times that in 
the second. How should a total sample of 1000 be divided between the two strata? 
2. The farms in a certain county are stratified according to size in seven strata, as 
shown in the table below. For the variate X (the number of acres in corn) the stratum 
means ш and the stratum standard deviations оч are as given. If it is required to take 
a sample of 100 farms for estimating some quantity closely related to X, how should 
these farms be allocated among the strata (a) with proportionate sampling, (b) with 
optimum sampling? Compare the precision of each of these methods with that of 


simple random sampling. 
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Farm Size (Acres) | No. of Farms pi о: 

0-40 394 5.4 8.3 

41 - 80 461 16.3 13.3 

81 - 120 391 24.3 15.1 

121 - 160 334 34.5 19.8 

161 - 200 169 42.1 24.5 

201 - 240 113 50.1 26.0 

241 - 148 63.8 35.2 


Hint: The precision varies inversely as the variance. The variance of the mean of a 
simple random sample is c?(M — N)/(MN), where c? is the overall variance of X. 
This can be found from the с: and ш by the formula: (M — 1)o? = У, (Mi — Пе? + 
Yu М(ш — в)®, в = У, Mua| M. 

3. Prove that И V;(¥) is the variance for a random sample and V;(X) that for a 
proportionate sample, then 


1 


с |z Mdm — i — DM — Miel 


MN(M — 1) 


4. A variate X is distributed in the population with density е-*, x > 0. The 
population is divided into two strata at the point xo and a stratified sample of size N is 
taken with proportionate sampling. Show that the variance of the sample estimator x 
of the population mean is М-Н — (xo%e-79)/(1 — e-7»)], and find for what value of 
xo this is least. Hint: The population is infinite, so that the sampling fraction fis zero. 
The ratios Mı/M and M2/M are given by the integral of e~* from 0 to хо and from xo 
to оо, respectively. 


B. ($ 7.3) 
1. From the following artificial population with three clusters, suppose that two 


clusters are selected and two units are selected from each cluster. Find the variance of 
the unbiased estimator for м. If the sampling is proportional to cluster size (one item 
from cluster 1, two from 2, three from 3) what is the variance? 


VAX) — V(X) 


Cluster No. (i) Xia Mi 


1 2 
2 1 3 4 
3 3-3; 5; 5 6 


Note that the clusters are widely dissimilar, so that the first term in Eq. (7.3.3) is much 
the larger of the two. 

2. If the sub-sample number for each cluster sampled is proportional to the size 
of that cluster, show that the estimator of p reduces to KT/(MIf) where f is the common 
sampling fraction N;/M; and T is the total of the Хул for all the items іп the combined 
sample. 

3. An alternative method of selecting the / clusters is to sample with probabilities 
proportional to cluster size. That is, the probability ж of selecting the й" cluster is 
Mi/M. (Note that this means sampling with replacement. The same cluster may appear 
more than once in the sample.) 

If the sub-sample size is the same (N) for each cluster sampled, show that T/NI is 
an unbiased estimator of м, where T is defined as in Problem 2. Hint: Let Т; be the 
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total for the j*^ cluster. Show that, for a given value of j, E(Tj) = Np; and that, over 
all j, Е(ш) = p. 
4. The variance of the estimator in Problem 3 is 


XM — и)? + (М: — №до ММ) 

i 
Show that with the data of Problem 1, this estimator has a considerably smaller 
variance than either of those in Problem 1. 


C. (8 7.4) 
1. Show that the variance of the mean of a systematic sample may be written 


Sw? 


N 


V(m) = (M — 05 —(N-1) 


where 
(Xu — т)? 
sê = Ly RN 1) 


which is the average of the variances within the separate systematic samples. Hint: 
Zy (Ху — ш = Zy (Xy — m)? + Ly (m — Ы), and this last term is NY (тн — 
ma 

2. Prove from the result of Problem 1 that V(mi) < V(m:) if and only if Sw? > о?, 
Where m, is the mean of a random sample of size N. This result indicates that systematic 
sampling is more precise than random sampling when the variance within a sample 
tends to be larger than that in the whole population, that is, when the sample is markedly 
heterogeneous. 

3. The following table exhibits an artificial population with a fairly steady rising 
trend; here M — 40, N — 4, k — 10, and each column is a separate systematic sample. 
Calculate the estimator for p from each of these ten samples. Find the variance of these 
estimators and compare with the average variance within samples and the variance of 
the mean of a random sample from the given population. 


Systematic Sample Numbers 


1 2 3 4 5 6 7 8 9 10 
0 1 1 2 5 4 J 7 8 6 
6 8 9 10 13 12 15 16 16 17 


18 19 20 20 24 23 25 28 29 27 
26 30 31 31 33 32 35 37 38 38 


COO MA, MEE NES e 22. 2-2 E 
4. Calculate the serial correlation coefficient px for a lag of k, defined by 


k(N.—1) 
КМ — По?рк = p (Xs — ШСХр+к — p) 


for the data of Problem 3. Hint: 
Dp Or — Ote — в = Xn GJ — CL Хр + 5 Xp) + КОМ — Dp? 
5. If the serial correlation coefficient p; for a lag of jk is defined by 


k(N-J) 
k(N — Ло? рук = ру (Хр — ШОХ р+к — p) 


and if pj; = (px), where рк is the coefficient for a lag of k, show that when terms of 
order 1/N? are neglected, the ratio of the variances of the two estimators ин and mr is 
given approximately, for large M, by И(т)/И(т) = (1 + рк)/(1 — рк). 
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D. (8 7.5) 

1. In a double sampling scheme where the items are stratified in two classes (e.g., 
large and small), the first sample of N produces Ni large and N2 small items. A random 
sub-sample of m is drawn from the №: items and an independent random sample of 
nz from the Ns items, and the variate X is measured on these sub-samples. If Ti, T2, 
are the totals of X for the nı and лә items, and T is the total for the whole population, 
show that an unbiased estimator of T is Г. = (AT: + kT2)/f and that its variance is 


(M — N)o? + (h — 1)Mioi? + (К — 1)Ms02? 


V(f 
(T) f 


where Л = Ni[m; К = Ne/ne, f = N/M. 
2. Show that the expression for the variance in Problem | may be written, if we 
ignore the differences between M and M — 1, Mı and М, — 1, Mz and Мг — 1, 
( —f) NES 
f p 


ИГ) = Ln = ш)? + Миг — в)? + Mioi? + Masa» 


/ 

(See the Hint following Problem А-2). 

3. Suppose the farms in Problem A-2 are divided into two classes, called large (over 
160 acres) and small (160 acres or under). A first, comparatively cheap, sample of 200 
is taken and this gives 40 large and 160 small farms. The variate X is measured on a 
subsample of 30 of the large farms and 50 of the small ones. Calculate the variance of 
T for the double sample and compare it with the variance measured on a random 
sample of 100 from the original population. 


Farm Size М: ш ci? Mio? 

Large 430 51.6 922 396,500 
Small 1580 19.4 312 493,000 
Population 2010 26.3 617 1,239,300 


4. Find expressions for the optimum values of М, h and k, using the method of 
$ 7.5 and supposing that only nı of the М! large units obtained in the first sample are 
sub-sampled (with additional cost С!) and that h = Ni/m. Hint: Differentiate Е 
partially with respect to N, h and k. 

5. Apply the results of Problem 4 to the data of Problem 3, assuming that Co = 0.1, 
C; — 0.8, C — 1.0, and calculate the values of / and k for optimum sampling. If the 
standard deviation of the estimator 7 is not to exceed 2650 acres, calculate the size of 
the primary sample and the expected cost of getting the required information with 
double sampling. Calculate also the size and expected cost of a simple random sample 
from the population to give the same precision of estimation. 


Е. (88 7.6-7.10) 

1. Suppose that in a certain population the probability 9 that an individual is 
defective is either 0.1 or 0.3, but cannot have any other value. We wish to test the 
hypothesis Но that 0 = 0.1 against the alternative Hi that 0 = 0.3, on the basis ofa 
fixed-sample test. The test consists in accepting Ho if the number of defectives dw ina 
sample of sizé N is less than А; otherwise we accept Hi. Find М and К if the risks of 
error are « = 0.02 and В = 0.03. 

2. Construct a sequential acceptance-and-rejection chart for Problem 1 above. 

3. Calculate the approximate'expected number of trials before a decision is reached 
by the method of Problem 2. Hint: Use Eq. (7.9.8), first assuming Ho and then 
assuming Hi. 

4. Perform an imaginary sampling experiment from the population of Problem 1, 
as follows: read off a set of one-digit random numbers (say a column from the table 
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in Appendix B.1), regarding each digit as a sample item. Count each zero as a defective 
(this corresponds to hypothesis Ho). Continue until a decision is reached, using the 
chart constructed in Problem 2, and note the number of trials necessary to reach the 
decision. Repeat 20 times, using different sets of random numbers, and note the average 
number of trials required. Compare with the result of Problem 3 for Ho. 

5. Suppose the proportion of defectives in a population can vary from 0 to 1, but 
that acceptance limits are fixed at 0.1 and 0.3. Construct the operating characteristic 
(or power curve) of the binomial sequential test, with а = 0.02, В = 0.03. 

6. Construct a sequential acceptance-and-rejection chart, for testing the binomial 
probability 0 = 0.5 against the alternative @ = 0.7, given the risks of error as « = 0.1, 
B = 0.2. If the following table represents the results of a sequence of trials, xm being 
the number of successes in m trials, show graphically that the sampling terminates with 
а decision in favour of @ = 0.5 at the 10th trial. 


m 1 2 3 4 5 6 7 8 9 10 


Xm 0 0 1 1 2 3 3 4 4 4 


7. Construct a sequential test for the mean м of a Poisson distribution, to test 
H = po against ш = p(y > po). Find the expected sample size and the power function 
Of this test for given « and В. Hint: The power function for any value of џ is given by 
Ри = (4^ — 1)/(A^ — B^), where h is the non-zero number for which 


$ (ре Ире, pop, p) = 1. 
х=0 


Here p(x, и) is the Poisson probability for x successes with parameter и. Show that Л 
is given by the relation p. + (pı — po) = (pılpo)"p. 
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Chapter 8 


EXACT TESTS ON SAMPLES FROM А 
NORMAL POPULATION 


8.1 The Assumption of Normality We have already obtained in Chapter 5 
some moments of the distribution of the sample mean and variance in samples of 
size N from a finite population, and we have noted the simplication in these 
results when the parent population is supposed to be infinite in size and normal 
in distribution. Thus the expectation and variance of К, are given by 


E(k) = и 
(8.1.1) в? 
V) =< 


where и and o are the parameters of the normal parent distribution. Also, 


E(k;) = kz = о? 
(8.1.2) 20“ 
ИК») = (N-1) 


furthermore, k, and k, are uncorrelated, i.e., 
(8.1.3) C(k,, К) 20 


The skewness and kurtosis of the distribution of k, were shown to be zero, 
which suggested that the distribution of k, might be normal, but the methods of 
Chapter 5 were unsuitable for finding exact distributions. Straightforward 
methods of finding density functions for statistics often lead to intractable 
mathematical expressions, but when the parent population is normal, the 
calculations are relatively simple, and a good deal of work has been done using 
the basic assumption of normality. Throughout this chapter we assume, unless 
otherwise stated, a normal parent population. 

The most popular tests among practical statisticians are tests which depend 
on normality in the parent population, and these tests are often used in situ- 
ations where the assumption of normality is decidedly dubious. Fortunately, 
however, the tests are usually quite robust, which means that considerable 
departures from normality will not affect them very much. When there is grave 
doubt about the assumption, non-parametric (or distribution-free) tests should 
be used, even at some sacrifice of power. 


174 


8.3 EXACT TESTS ON SAMPLES FROM А NORMAL POPULATION 175 


8.2 The Distribution of the Sample Mean The fact that the distribution of 
the sample mean is normal when the population is normalis easily demonstrated. 
We saw in $ 2.8 that the cumulant generating function for a linear function 
L(= У, С,Х)) of independent variates X; is given by 


(8.2.1) K,(h) =), КАС jh) 


where K (A) is the c.g.f. for Xj. If all the X; are normal with mean и and variance 
c?, and if L = X = Y; Xj/N, 
3 N в?һ? . 
(8.2.2) КЦ = z (и wt on? zd 
о? |? 

sity 
and this is the c.g.f. for a normal distribution with mean p and variance c?/N. 
On the assumption that a distribution is uniquely determined by its c.g.f., this 
proves the normality of L. The distribution of the variance and higher 
moments is not, however, so easily obtained. 


8.3 The Distribution of the Sample Variance One way of arriving at the 
distribution of the variance is to find the joint distribution of the mean and 
variance, and then integrate оуег all possible values of the mean. For simplicity 
we will choose the origin for X in such а way that the population mean 15 zero. 
This will have no effect at all on the distribution of the variance. If the № 


Observed sample values of X are Ху, Хз. Х№ the joint density function for the 

sample is 

(8.3.1) Jg хо... Хм) = Qno 
Now Fx? =Z- + a = Dr - 8)? + № + 2x Y, Qr - 3) 

Where i is the sample mean. Since У, (x; — X) = 0 and © (x; — X)! = Nm, 

Where m; is the second sample moment about the mean, we have 

(8.3.2) Z x? = № +m) 


Therefore (1) may be written 


2)-№/2 6-х? 120 


(8.3.3) f(x Xi хм) = (na?) NZ Gne aet 
Since m, is proportional to the sample variance kz, the relation being 
(8.3.4) Nm; =(N – Dk; 


the distribution of kz is easily obtainable from that of mz. 

There are two methods of proceeding in a problem like this—one is the 
analytical method, which involves а good deal of algebraic manipulation; the 
other method is geometrical and requires considerable spatial intuition. Fisher's 
original approach was geometric, but various writers since have given analytical 


ofs. Both promethods are explained in detail in [1]. 
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In the analytical method we change the set of variables x,, x;...xy to a 
new set, of which two will be x and m5, the variables which appear in Eq. (3). 
We therefore need N — 2 new variables, w,, w5 ... wy. 2, which can be chosen 
in the most convenient way and which will later disappear. Then 


(8.3.5) f(xy, x2... xy) dx, dx; . . . dxy 
= 9(01, №2... Wy-2, X, Mz) dw, ... dx dm; 
Since, by Eq. (3), /(x;, . . . xy) is already expressed in terms of the new set, 
we merely have to work out the relation between the differentials. This is 


(8.3.6) dx, ... dxy = J| dw; . . . dx dm; 


where J is the Jacobían of the old variables with respect to the new (see Appendix 
А.4). For a certain particular choice of the w's, we find 


(8.3.7) J = 4NO C 2m 73)2 р 


where D is an expression, in the form of a determinant, depending only on the 


w's. It follows that 
NW —1)/2 


= = МЕ... 
(838) оби, ња... 3, ma) = ozgana Ота" > exp| — c " mj] 


If we now integrate over all possible values of the w's (this integration does not 
actually have to be carried out), we know that the result must be of the form 


N 
(8.3.9) h(X, mz) = Cm, 3/2 epf- (x 5] 


where C is some constant depending оп the bounds of integration of the w's 
and on the constant factors in Eq. (8). We do not need, at this stage, to know 
exactly what it is. Since this joint distribution is of the form f(x) multiplied by 
S(mz), it is clear that x and m; are independent variates. To obtain the distri- 
bution of m, we simply have to integrate over all possible values of x (— œ to 
+ оо). This gives 

(8.3.10) f(m) -Í х, m2) dx 

eo 


© 

= Ст» (М-3)/2е-№тз/20* eg №120? ах 
- о 

= Ст," 3/22 Nmi2e? 


where 
Zro 1/2 
(8.3.11) C, -с( N ) 


The constant C, can now be found, since 


(8.3.12) Гл dm, =1 
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By the substitution и = Nm;/26?, this becomes 


269-002 fo Я 
e (c) u( 732627" du =1 
o 


N 


є М — 1 
from which, since the integral is Г (>). 


Мү 1)/2 
(s 


igne per: 
з] 
2 


(8.3.13) 


This gives, with Eq. (10), the distribution of тз. That of К; is found from 


Eq. (4), since 
g(k) dk; = f(m3) dm; 


N-1 
=f (тз) м. dk; 


50 that 
m Lg A739 040 
(8.3.14) glk) = Е 1 e x ks) g N- 10/202 
Е N = 1 una" (ч-зу2 g^ (7 Dal2e* 
26 ^ г - T 
2. 


terms of n = N — 1, which is the number 


This is more concisely expressed in j и 1 d 
n for the variance. With this notation, 


ОЁ degrees of freedom in the expressio 


1 jc 7 Dag 7 nare? 


n/2 
(8.3.1 = 5) т 
I 90) = (502) TOD * 
If we put nk,/o? = х2, this becomes the ordinary x? distribution with n 
degrees of freedom. Thus, 


aks) dka = ЛОР) d£! = 75:f (7) dka 


So that 


d e l2 


(8.3.16) ға?) = 5 T(n/2) 


the same as Eq. (4.6.4). 


2 H 
If we put u = La c) u is а gamma variate with parameter n/2. 
2 2c 


Its density function is -u 
e 


Хи) = Jes T(n/2) 
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and the moments and cumulants can be found from § 4.4. The r'? cumulant of u 
is (n/2)(r — 1)!, so that the r^ cumulant of k, is 


23 ғ " 
(8.3.17) к, = (=) 5 (r = 1)! 


——ÀG 


* 8.4 The Geometrical Approach to the Joint Distribution of the Mean and 
Variance ` Because one cannot visualize space of more than three dimensions, 
we will carry through the discussion for N — 3. The argument is similar for 
larger samples. 

We suppose as before that the population mean is zero. The observed sample 
values x;, X2, хз are considered as the coordinates of a point in the sample 
space of 3 dimensions. The sample mean x is given by 


(8.4.1) X —d6 + x? x3) 


For a given value of x this equation represents a plane equally inclined to all 
three axes. 
The sample second moment т» is given by 


(8.4.2) m; —3[G, — x)* + (x; — X)? + (x3 — x] 
and for given x and m; this represents a sphere of radius (3т,)!/? with its 
centre at the point (x, X, x). 


The sphere and plane intersect in a circle of center M, where OM = 4/3x 
and MP = (3m;)'? (see Figure 39). 


Fic. 39 ELEMENT OF THREE-DIMENSIONAL SAMPLE SPACE 


If x increases slightly, the plane of Eq. (1) moves parallel to itself in the 
direction OM a distance d(OM) — dt — зах. 
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If m, also increases, the circle of centre М enlarges, the radius increasing by 
an amount d(MP) = (V3/2)m,~*/? ат», which produces an increase of area 


dA = 2ar dr = 218m) ^(v3[2)m, М? dm; 


=3n dm; 
The volume of the ring-shaped element so formed is 
(84.3) dV =dA-dt = 3\/3п dm; dx 


The probability that а sample point will lie in this element of volume is, by 
Eq. (8.3.3), 


(8.4.4) dP = Qno?) ?? exp[ -3G? + mz)/207] dV 
- 3V3 o> exp[—3(¥7 + m3)[2c?] dm; ах 
2 2n 


and this is the same as Eq. (8.3.9) when we put М = 3. By Eqs. (8.3.11) and 


(8.3.13 
) N 1/2/ N (м-1)/2 (GS 
( ) 202 1 Me 


2no? 
N- 2 
г( : 


and when № = 3 this becomes 


C= 


3/2 ХЕ 
Je) cans 


before. In the N-dimensional case, the 


e? 


as in Eq. (4). 

The rest of the argument is just as 
hypersphere of N dimensions intersec 
N — | dimensions. The radius is 


Where К = ed $ 2! The element of "volume" is VN ах dA 
2 


- ки" ES m7 dm; d and this, with Eq. (8.3.3), gives the same 


Tesult as Eq. (8.3.9). 
If x is the mean of a sample of N from 


8.5 The Distribution of Student's t 
а d variance o?, then, as we have already 


а normal distribution with mean и ап 
Seen, the variate "m 
x-u 
= N12 ——— 
(8.5.1) z=N = 
is a standard normal variate. If, however, © is replaced by itsestimators = К„!/?, 
the quantity 


х—Ш) 
(8.5.2) t= yin B 
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is not normally distributed, except asymptotically for large N. The distribution 
of t (actually of sV~*/*) was originally obtained by W. S. Gosset, writing 
under the pen name of "Student" [2], and the importance of this distribution in 
a variety of practical situations was later emphasized by Fisher. 

The joint density function for x and s may be obtained from Eq. (8.3.9) by, 
noting that m5 = (N— 1)s?/N , so that 


(8.5.3) f(E, s) d¥ ds = h(X, m3) dX dm, 


N=1 0-92 Nx? (N — 1)s?]/N-1 И 
=c( е я) [ее (Mas as az 


Nx: N — 1)s? 
= Аи? о [ je |- 922] АХ ds 


20? 
М н 


„ш. 


where 


and where we are assuming as before that и = 0. 
If we change the variables from x and s to t and s, we obtain as the joint 
density function for t and s 
2,2 2 
— деМ-1м-1/2 A (N — 1)5 
(8.5.4) g(t, s) = Аз T N exp] zs Jeo|- 922 
since, for fixed s, s dt = N'/? dx and since g(t, s) dt ds = f (X, s) dx ds. 
By integrating over all values of s, we obtain the desired density function for 
t, namely, 


(8.5.5) 70 = | “g(t, s) ds 
0 


1 n -1 t? (n+1)/2 
=r [Е | f =) 
i | E aj ( ЫГ 


(see Appendices A.6 and A.7). Here п = № — 1 and is called the number of 
degrees of freedom for t. 

The important characteristic of f(t) is that it is independent of c, and there- 
fore tables of this function can be used to test hypotheses about the mean of a 
population, irrespective of what the variance may be. 

The graph of f(t) is a symmetrical unimodal curve, tailing off towards zero 
at both ends. The tails are higher, and the central peak is higher, than for а 
normal curve of the same mean, variance and total area (see Figure 40, drawn 
for n — 4). As n increases, the curve becomes more and more nearly normal. 
To show this we note that 


t2 7 0*0/2 pou 12\ 77/2 
(8.5.6) lim ( + =) = lim (1 + =) -lim ( + =) 
п п п 


п» o п» о п 


=@-(@е-") 
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(see Appendix A.1), and by using Stirling's approximation (Appendix A.2) it is 


easy to prove that 
; 1 
Wr 
ОЕ | | = lim -— ar 


The limit of f(t) is therefore (27) ^e ^^"^, which is the density function for 
а standard normal variate. 


-tæ O ta t— 


Fic. 40 GRAPHS OF THE T-DISTRIBUTION AND NORMAL DISTRIBUTION 


From symmetry, the odd-order moments of f(t) are all zero, but the (27)^ 


moment is 
ЕЕ LE 
= £ 2 — dt 

(8.5.7) и» = 2n v» | | : ( Ы ;) 

пог 100г — 3)...1 

= (n—-2)(n-4...(— 2r) 

; n 3n? E. a 

Thus mo. so that ка/[к2? = uu; — 3 


D P — T ME s. 
n-2 Ha. (n — 2(n — 4 
7 6/(n — 4). For n » 4 this is always positive. 
z RG — wla? 
2 м = и? _ IN (x — Wel’ The numerator of this 
From Eq. (2), a m ns? |o? 
fraction is the square of a standard normal variate (and therefore a x variate 
With one d.f.) and the denominator (by 8 8.3) is an independent д^ variate with n 


d.f. The fraction itself is therefore the ratio of two gamma variates with para- 
Meters 1/2 and п/2, respectively (see $ 4.6), and this ratio may be shown to have a 
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beta-prime distribution with parameters 1/2, n/2. It is often useful to know that 
any statistic t has the Student-/ distribution if t?/n is the ratio of two independent 
variates distributed respectively as у? with 1 and n degrees of freedom. 


8.6 Tables of t and Approximations tot In Appendix B.4 there is a table of 
the integral 


(8.6.1) а = P(t > t) - [о dt 


where f(t) is given by Eq. (8.5.5). This table gives, for all values of п from 
1 to 30 and for selected values of the probability о, the values of t, satisfying 
Eq. (1); that is, it gives those values of ż which in a random sample of size 
n + 1 from a normal population will be exceeded with probability «. 

Tables of ¢ often give instead the values which will be exceeded numerically 
with probability a. This procedure corresponds to the equation 


(8.6.2) a = P(t] > t) =1- Г f(t) dt 


Because of the symmetry of t, this probability is just twice that given by 
Eq. (1). If the two-tailed probability is wanted from the table in the Appendix, 
the probabilities given at the head of the columns should be doubled. 

n—2 


1/2 
The variance of t is п/(п — 2), so that ( ) is a standardized variate 


and is approximately normal for fairly large n (say n > 30). Thus, the approxi- 
mate value of t from Eq. (1) for = 30 and « = 0.05 is given by multiplying the 
corresponding normal variate (1.645) by (30/28)!/?. This gives 1.703, whereas 
the correct value is 1.697 


A better approximation, given by Hendricks [3], is 
n+1 
r( : [ 


r(5)e + 2п)!? 


(8.6.3) 252 


where 2 is the normal variate giving the same probability as the actual t. Thus, 
if n = 30 and t = 1.697, the corresponding z would be 1.644, which is quite 
close to the true value 1.645. 


1 
If n is even, the factor Г =) / г(;) in Eq. (3) is equivalent to 
mole | 
! 
11? (n—1)! nc [( 2 Д . 


7 and if n is odd it is equivalent to 


27i a2- DIP т (n1)! 
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In either case a good approximation for large п is 


п+1 
CH). үе 
> 1 1 
(8.6.4 A2 4 РЯ з) -— _.. | 
) E ") 2 ; 4n + 3m 
E 
An excellent approximation to f, given by Cornish and Fisher, is the 


following: 
Let t, and z, be defined by 


(8.6.5) а = "Ko dt = [вое 


where $(z) = (21): !/? e*'?. Then 
3 52,5 + 162.3 + 32 

8.6. " Za + 2а а a a 
(8.6.6) T ы Тоо 
For n = 30, and а = 0.05, 2, = 1.645, the second term is 0.0508 and the third 
15 0.00158, so that t, = 1.697, which is correct to four figures. 

8.7 Confidence Limits for the Population Mean Equation (8.6.2) may be 
Written 


(8.7.1) 1=@=2 [К dt 
0 


Where ; = N!/2(x — y)/s. There is а probability 1 — « that the observed value 
Of the statistic z, for a sample of size N from a normal population with mean 4, 
Will lie within the limits +, (see Figure 40). This is equivalent to the statement 
that the 100(1 — «)% confidence limits for р, corresponding to observed 


values of x and s for the sample, are given by 
(8.7.2) p-XXsuNC 


EXAMPLE 1 [4] Electric meters are adjusted to work synchronously with a 
Standard meter. After adjustment, à sample of 10 meters was tested by means 
Of precision instruments. If the standard meter 15 rated at 1000, the observed 
Tatings for the sample were as given under x in the following table. The question 
to be answered is whether the meters tested can reasonably be regarded as a 


random sample from а normal population with mean 1000, or Whether thereisa 
Systematic deviation from this standard. From the data, х = 994, s? = (744 — 
160)/9 = 64.9. and if « = 0.05, the value of t, for nine degrees of freedom is 
2.262. The 95% confidence limits for и, therefore, are 


64.9 =994 + 5.8 


1/2 


2.262 
94 + —— Ni 
: J/10 


Or 998.2 to 999.8. Since these limits do not quite include 1000, there is a barely 
Significant deviation (at the 5% level) from the standard value assumed. 
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The null hypothesis here is that и = 1000 and the alternative hypothesis 
(which we are led to accept) is that и < 1000. If и were greater than 1000, the 
probability of the observed г value (or less) would be less than 2.5%. 


TABLE 8.1 

x u=x—990 и? 
983 —7 49 
1002 12 144 
998 8 64 
996 6 36 
1002 12 144 
983 —7 49 
994 4 16 
991 1 1 
1005 15 225 
986 —4 16 
40 744 


8.8 Confidence Limits for the Difference of Means in Two Populations If we 
suppose that two independent samples come from two different normal popu- 
lations with means и, and и, but with а common variance gê, we can form 
confidence limits for the difference ш, — uz. If these limits include zero, there 
is no significant difference (at the chosen level) between the means. 

Suppose the samples are of sizes N, and №», with means X,, X5, and variances 
512, 5,2, respectively. An unbiased estimate of o?, based on both samples, is 


Ра 11512 + n5s;? 
п, tn; 


(8.8.1) 


where ny = М, — 1, n; = М, — 1. This follows at once from the fact that 
E(s,?) = о? and E(s,”) = c?, so that E(6?) = o°. 

By hypothesis, x, and x; are both normal with means и, and u, and variances 
c?|N, and 02/№,, respectively. Therefore, x, — x, is normal with mean 
ш — ш and variance c?(1/N, + 1/N5). If we substitute for c? the unbiased 
estimate of Eq. (1) we obtain the statistic 

"m TE ү? 

[x — X — (и, — и2)] [г (= + ral 
which has the Student-r distribution with n, + n, degrees of freedom. The 
100(1 — «)% confidence limits for и, — и» are therefore given by 


n,s,? + 12522 № + mr 


= fy =F, X4 Xt 
68.2)  ш-ш=я xiu ам. UN: 


EXAMPLE 2 Two batches of concrete were made with slightly different 
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qualities of sand. From each batch four cylinders were made up and tested for 
compressive strength (Ib/in.?), with the results shown: 


Batch No. ^ Values of X X 
1 1690, 1580, 1745, 1685 1675 
2 1550, 1445, 1645, 1545 1546 


The variances for the two samples are 4750 and 6673, respectively. These values 
are close enough to justify an assumption that the two batches do not differ in 
variance (a test for this will be given later, 58.13). The question is whether the 
means differ significantly. WehaveX; — X2 = 129, п; = m = 3, М, = № = 4, 
and the 95% confidence limits for и: — 4» are 


11423 1 1/2 
129 + 2.447|—5—°5 


= 129 + 131 or —2 to 260. 


These limits include zero; therefore, at the level of significance chosen, the 
hypothesis that there is no difference between the means for the two batches 
Must be accepted. The observed difference is, however, almost significant at 
this level. 


8.9 Confidence Limits for the Difference of Means in Paired Samples In 
Some types of experimental work, the two samples which are compared are not 
independent random samples but are deliberately paired in such a way as to 
Teduce as much as possible all accidental differences other than those due to the 
particular effect which is being investigated. Thus in testing the effect of a drug 
On some property of the blood, the same group of experimental animals might 
be examined before and after administration of the drug. This procedure renders 
the experiment more precise, since it eliminates the random variability between 
one group of animals and another; this variability might, or might not, affect 
the particular property under investigation. If the sample size is №, the number of 
degrees of freedom is n( — № — 1) instead of 2n as it would be if two independent 
random samples of size N had been used. 

Again, in comparing the yields of two varieties of apple, one can imagine an 
experiment in which pairs of trees of the two varieties are grown side by side, in a 
dozen different locations. Differences of soil fertility, drainage, chemical 
composition of the soil, etc, will then be almost entirely eliminated from the 
comparison of yields, since each variety in any one pair is growing under almost 
the same conditions as the other variety, and the method of paired samples would 


be applicable. 

The method of analysis in 
members of a pair, d; = X1; — X2» 
and the subscript 2 to the other. (Thus 
ment and 2 to the same animal after trea 
treatment has really no effect (in the case о. 


such a case is to obtain the N differences between 
where the subscript 1 refers to one sample 
1 might refer to an animal before treat- 
tment). On the null hypothesis that the 
f the apple trees, that the varieties 
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do not really differ in mean yield), the expectation of d; is zero. If s? is the 

variance of the differences, i.e., 

(8.9.1) п = У а2 – ма? ` 

the quantity dN*/?s~* has the Student-r distribution with n degrees of freedom. 
On the hypothesis of a true difference д between the means, 


(8.9.2) (4— )№!/25-1 = +1, 

where f, is the value of т which is exceeded numerically with probability а. Then 
Eq. (2) can be written 

(8.9.3) 6=d+st,N~*/? 

which gives 100(1 — «)% confidence limits for 6. 


EXAMPLE 3 The following table gives pH values for the arterial blood of 
dogs (a) breathing normally, (b) after a period of breathing air containing 5% 
carbon dioxide. 


TABLE 8.2 
Dog (a) (b) а= 
Number xi хә хі — Xa 
1 7.42 7.26 : 0.16 
2 7.53 7.30 0.23 
3 7.36 7.26 0.10 
4 7.43 7.39 0.04 
5 7.43 7.38 0.05 
6 7.15 6.69 0.46 
7 7.50 7.32 0.18 
8 7.34 7.26 0.08 
9 7.45 7.23 0.22 
10 7.42 7.06 0.36 
11 7.53 7.34 0.19 
12 7.48 7.28 0.20 
13 7.42 7.29 0.13 


The mean value of d is d = 0.1846, and s? = (0.6149 — 0.4430)/12 = 0.0143, 
so that dN!/?s7! = 5.57. For 12 d.f., the 1% value of t (for a two-tailed test) is 
3.055, so that the probability is considerably less than 0.01 that a random sample 
of 13 animals would exhibit a mean difference as great numerically as that found 
if there were really no effect of the treatment. The hypothesis that breathing 5% 
of CO, has no effect on the pH value for the blood is decisively rejected. 

If one could feel quite confident, before the experiment is performed, that if 
there is any effect it could be only one way (could result only in a lowered pH 
value) one would be justified in using a one-tailed test. The 1% value is then 
2.681 and the probability of the observed result on the null hypothesis is even 
lower than before. 
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In terms of confidence limits, the 99% confidence limits for 5 would be 
0.1846 + 0.0332 (3.055), or 0.083 to 0.286. This expresses in another way the 
fact that there is a highly significant difference produced by the treatment. 


* 8.10 The t-Test and Maximum Likelihood Suppose the null hypothesis 
Но is that our sample of N values of Х comes from a normal population with 
mean ио and variance o”, and the alternative hypothesis H, is that the sample 
comes from a normal population with mean и1(> до) and the same variance a”. 
For convenience, we can take ро as zero (this simply means subtracting a 
constant amount до from all the observed values of X). 

Under Hp, the likelihood for the particular set of observed values хи, x; . .. 


Xy is 
xj 
(8.10.1) ES Qno?) "? ey[-X xj 
While under H, it is 
(x — м 
(8.10.2) L, = Qno?) ^ ap -E9 i 


The region of rejection R must satisfy the condition 


(8.10.3) | Lo хи... dxy =o 
(R) 


for a fixed value о of the probability of wrongly rejecting Ho. However, R can 
be chosen in many ways. To make the test as powerful as possible we should 
Choose R to maximize the probability of rejecting Ho when Н, is true. That is, 


the probability 


(8.10.4) P -| 1, ху... хн 
(R) 


is to be a maximum subject to the condition of (3). This implies that we should 


maximize (Ly = Ао) ах: +++ dxy without restriction, 4 being a Lagrange 
Multiplier (во Appendix А.15). If we include in R all the points for which 
Lı — AL, > 0 and exclude all points for which L, — ALo « 0 we shall make the 
integral as large as possible. The boundary of R is given by L, — ALo = 0. 
This equation is equivalent to 
log L, = log A + log Lo 


which reduces to 


YG i» =} ч = 


Xx = Cy, where с: is another constant depending on 
f rejection is defined by X > с. 


c = 202 log À 


This is again equivalent to 
©, ш, Nand A. The region О 
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The situation can perhaps be appreciated geometrically for the special case 
М = 3, as in $8.4. The more general case requires a familiarity with N- 
dimensional geometry. The likelihood Го is constant on the sphere У xj? = 
constant (with center at the origin). Equation (3) implies that on any such sphere 
we can pick a region of rejection R 
equal in area toa fixed fraction « of the 
surface, and we can agree to reject Ho 
when the sample point lies in this 
region. The maximum likelihood 
condition implies that this region is 
the “cap” cut off on the sphere by the 
plane X = сү, this plane being per- 
pendicular to a line equally inclined 
to all three axes. (See Fig. 41, where 
the shaded area represents the cap 


Я DER cut off the sphere by the plane.) The 
A % boundaries of all the caps for different 
Fic. 41 REGION OF REJECTION OF Но spheres lie on a circular cone, with 

WHEN N — 3 vertex at the origin. 


The plane X = c, lies at a distance /3c, from the origin. The sphere 
Y x? = с, is of radius Ус, and the fractional area of the cap is given by 
(8.10.5) «= Ves = Зе, 
2V c3 


Now the'sample variance for a sample with representative point (хи, X2, Хз), 
lying on the intersection of the sphere and the plane, is 


(8.10.6) s? = (х, x)? + (x, — 3)? + (ху — 3)?] 
=4(x,? + x,? + x3? — 3X?) 
= $ (с. — 3c,?) 
and ¢ for this sample is 
8.10.7 TN LEN x 
(8.10.7) V3 - зв 
For N = 3 (and therefore n = 2) the distribution of Student's г reduces to 
(8.10.8) F(t) = Q2) Ца + 0212) 3/2 
and the integral of this from t, to oo is 
© 
(8.10.9) | F(t) dt =4[1 102 + 1,2) 12] 
fa 


Substituting for f, the value given by Eq. (7) we find precisely the « of 
Eq. (5), so that the one-tailed ż-test (namely, reject Ho when t > t) is the same 
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as the maximum likelihood test described above. It follows from Eqs. (5) and 
(7) that г = (1 — 2х — 22?) !? so that the value of г for such a sample is 
independent of the value of, 0? and of the particular value of и, chosen for 
hypothesis H,. The test is therefore uniformly most powerful against any Н, 
with ду > 0. 

Similarly, if i, < Othe test —/ > 1,5 uniformly most powerful. However, if 
H, may be greater or less than zero, no uniformly most powerful test exists, 
except in the special class of unbiased tests. A test is said to be unbiased if its 
power function for testing the hypothesis that a parameter 0 is equal to 0, has a 
minimum at the value 0. The ordinary two-tailed г test (|t| > ta) provides а 
uniformly most powerful unbiased test of Но against H,, where Н, is the 


hypothesis |u,| > 0. 


* 8.11 The Power of the t-Test The power is the value of P given by Eq. 
(8.10.4) subject to the condition of (8.10.3). As in $8.5, the probability Г, dx, 
= Яху can be expressed as f(X, 5) dx ds, except that now the population mean 
is to be taken as д, instead of zero. We have 


2 NG — ш)? 
(8.11.1) f(x, s) = As"! ew (-25)ev| - — | 


Where 4 = enoar. | |276) with n = № — 1 as usual. 


If we define t as t = №!/2х/5, we can find P by integrating Eq. (1) over all 
X > МЧ? and over all s from 0 to oo. For any point in the region so de- 
limited, г will be greater than 1,. Therefore, 


= = мх — iu» = 
(8.11. T n= 1g 7n52/2e? x (“= ах ds 
1 5 pa о 3 - UNS. P 20? 


2/0?, so that z is a standard normal 


On putting z = N!/A(x — p,)/o and x^ = ns 
s SE Puls th л degrees of freedom, Eq. (2) 


variate and y? is the ordinary x^ variate wi 


becomes 
© 2\ (п-2)/2 © 
(us pects (5) gn | $(2) dz 407) 
2Г(п/2) Jo \2 " 
Where | 
фб) = Оп) en? 
and 
(8.11.4) = T RE _ "PRI 


This integral can be evaluated numerically for given values of n, а, and 
щ/с. The power function depends on © (through the quantity y) and therefore 
Some preliminary information about с is necessary if we want to use the power 
function. We could, for example, calculate the size of sample necessary to 
detect, with a given probability, a given deviation of ил [c from zero but without 
the information about c itself we cannot say anything about ji. 
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* 8.12 The Non-Central t-Distribution If the distribution of + on hypothesis 
H, is calculated as in § 8.5 from Eq. (8.11.1), we obtain, after some reduction, 


р —(n+1)/2 a 
(8.12.1) fit) = c(t + =) e" НЮ) 
where 
" n! = м! 
207 U?T(n[2)(nn)! oie den 
and 
(8.12.2) Hh(k) -| = е0 dy 
оп! 


This function is known as Airey's function. The quantity д, defined by 
(8.12.3) б= Ми? №1 
с 


is called the non-centrality parameter. Any statistic of the form 
(8.12.4) t=(z+6)w 1/2 


where 2 is а standard normal variate and nw is an independent variate distri- 
buted as y? with n degrees of freedom, has the non-central t-distribution. 


+1 
a i ) 
2 


When ô = 0, К = 0 and НИК) = 7 
п! 


The density function then reduces to that for the ordinary Student-t, Eq. 
(8.5.5). 
The power of the t test is the integral of /,(t) from t, to oo, 
(8.12.3) P -| ло dt 
fa 
where f, is given by 
š © 
(8.12.4) a af f(t) dt 
ta 


This gives the same result as Eq. (8.11.3). 

Extensive tables of non-central { have recently been provided by Resnikoff 
and Lieberman [5]. These give the density function f,(t), the cumulative distri- 
bution function Е, (0), and certain percentage points of the distribution. Since 
J,(t) depends on two parameters, n and ô, the tables are of triple entry. Values of 
n go from 2 to 49, 2(1)24(5)49.* The argument used is a n instead of t, this 
arrangement being more convenient for tabulation. The parameter д is expressed 
in terms of z,, where z, is the standard normal variate exceeded with probability 


*This notation means that n goes by steps of 1 ton = 24 and then by steps of 5 to п = 49. 


8.13 EXACT TESTS ON SAMPLES FROM A NORMAL POPULATION 191 


a—as in Eq. (8.6.5)—the quantity tabulated being 6 = № 1/27 , for ten selected 
values of « from 0.001 to 0.25. 
A rough approximation to the non-centrality parameter for moderate-sized 


samples is 


(8.12.5) oe = 


141,2} 1? 
2n ) 

where zp is the standard normal variate exceeded with probability P. If f is the 
probability of error of the second kind, P = 1 — В. This approximation is 
useful in estimating the difference in the mean which can be detected with given 
probabilities of error. 

EXAMPLE 4 Suppose we are using а sample size N — 17, and are willing to 
allow errors « = 0.05, В = 0.2. That is, we will accept a risk 0.05 of wrongly 
rejecting the hypothesis that и = 0 and a risk 0.2 of wrongly accepting the 
hypothesis that и = ду. How large must 4t; be? 

Using the approximation Eq. (5), with n = 16, 

1,1746, 2р = -0842 
8 = 1.746 + 0.842(1.0466) = 2.627 


and therefore ш/о = 6/17 = 0.637. This means that we should have about an 
80% chance of detecting a real difference in the mean, from the assumed value 


Zero, equal to about 0.64 times the standard deviation. 

Some tables compiled by Neyman and Tokarska [6] are rather more con- 
venient than the larger tables [5] for this particular type of problem. They give 
Юг а = 0.05 and 0.01 and for n = 1(1)30, the values of 6 corresponding to 
selected values of В. From these tables, with а = 0.05, В = 0.2, and n = 16, 


We find ô = 2.60, giving 14/6 = 0.631. 
* 8.13 Sampling Inspection by Variables А procedure which depends upon 


non-central z is that of accepting ог rejecting a lot according to the percentage of 


defectives p, where "defective" is defined as meaning that a measured random 


Variable X has a value above some fixed standard и. This variate X is supposed 
to be normal with unknown parameters И and c. The method is to measure the 
mean m and the standard deviation 5 for a sample of N and accept the lot if 


(8.13.1) m+ks<u 
where k is a constant. This criterion can be written as 
JN ta > Ум 
5 
ог 
Ми u) Умот 0 


(8.13.2) LI — > мк 
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Now ns?/a? is distributed as x? with n degrees of freedom, and VN(m — w/o 
is a standard normal variate, so that the left-hand side of (2) has a non-central t 
distribution* with n(— N — 1) d.f. and non-centrality parameter given by 


Уми -=p 


- =VNz, 


(8.13.3) ô 
where 2, is the standard normal variate exceeded with probability p. The 
acceptance criterion is therefore 


(8.13.4) PENES ded NL 
S 
The power function (the probability of accepting the lot) is 
(8.13.5) P =( S(t) dt 
УМК 


and several values can be found by interpolation іп the tables of Resnikoff and 
Lieberman for given N and К and different p. Thus if N = 10 and k = 1.72, we 
find that when p = 0.10 (corresponding to д = 4.053), P = 0.224, and when 
р = 0.004 (corresponding to ó = 8.386), P = 0.969. 

Conversely, if we fix two points on the power curve, we can find М and k 
and so set up a sampling acceptance plan. Thus suppose we want the values of P 
corresponding to p, = 0.01 and р, = 0.15 to be 1 — «(= 0.99) and (= 0.10), 
respectively. That is, if p is as low as 0.01, we shall be almost certain (probability 
0.99) to accept the lot. If p is as high as 0.15, we shall be very likely (probability 
0.9) to reject it. The corresponding values of N and К are found by trial and 
error, using the tables. We want to find two consecutive values of л, say n — 1 
and n, such that for n — 1 there is а г for which simultaneously 


| ЛО) dt < 0.99, — à Ум — 1201 =2.326/N — 1 
$ 


and 


| fit) 2010, 5= у 1201; = 1.036 /М —1 
р 
while for n there is а t” for which simultaneously 
© = — 
| Л > 0.99, = №20 =2.326/N 
" 


and 
| f(t) dt < 0.10, бё =JNzo.45 = 1.036 /М 
j 


*The negative of this side has the non-central г distribution with parameter —ô, but this 
is the same as saying that the side itself is non-central ¢ with parameter 5. 
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We find that for 16 d.f., with p = 0.01 and /'/4 = 1.55, P is 0.9896, while for 
17 d.f. and the same p and t”, P is 0.9942. Also, for 16 d.f. and t'/4 = 1.55, with 
р = 0.15, Р = 0.1064 while for 17 d.f. with the same p and 1", Р = 0.079. We 
can therefore choose п = 17 (М = 18). Here it happened that г and г” аге 
identical. With n = 17, we find by interpolation that r’ = 6.43 corresponds to 
Р = 0.99; К is therefore 6.43/18 = 1.516. The sampling plan is to take a 
sample of size 18 and accept the lot if > 6.43. This will be a little stricter than 
desired but near enough for practical purposes. 


8.14 Confidence Limits for the Variance of a Population Since the quantity 
ns?/o? is distributed as y^ with n d.f. for samples from a normal parent popu- 
lation, it is easy to construct confidence limits for c? corresponding to a given 
Sample variance. It is merely necessary to find from a table of y? the values 


217 and y;? such that 


(8141) NZ dp = i fo аё =5. 
P 2 o 


Then the lower and upper confidence limits for c? are given respectively by 


ns? n 
(8.14.2) 02-12, ej =; 


ы! 


and the confidence coefficient is 100(1 — 9) %. It is not, of course, necessary that 


the two tails should each be «/2 in area, provided that the sum is equal їо х. How- 
ever, it is usual to take them as equal. 
EXAMPLE 5 The variance of a sample of size 10 is 0.064. What are the 95 % 
Confidence limits for o°? 
The values of y,? and xz? corresponding to © = 0.05 9 5ана 2.700, 
MM an 0.030 and c;? 


for 9 d.f. The confidence limits are therefore dei = 19.023 


Є 3706 = 0.213. 

Confidence limits for the standard d 
from the distribution of s. Nevertheless, 
9" and use the positive square roots, in spite of the fact th 


15 not an unbiased estimator for с. 


eviation c should, strictly, be obtained 
it is customary to obtain the limits for 
at the square root of s? 


ance Ratio Suppose 512 and 52? are the ob- 
of sizes N, and N3, drawn from normal popu- 
ively. We can test the null hypothesis 
bution of the variance ratio 


8.15 Distribution of the Vari 
Served variances for two samples, 
lations with variances c; and оз” respecti 

o that c,? = о,2 by calculating the distri 


(8.15.1) F = 52/52 
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This ratio is generally denoted by F in honour of Sir R. A. Fisher, although 

Fisher originally used the related statistic 

(8.15.2) z =} log.(s1?/s2°) = 4 log, F, 


which has a more nearly symmetrical distribution than F. 
On the null hypothesis, with с,2 = c,?, 
(8.15.3) "m mala ды 


= ay. 2 
n2 n585*[0, Xn; 


both the numerator and denominator being y^ variates with m, and n; d.f. 
(п; = М, — l and n; = N, — 1). The ratio is therefore a beta-prime variate 
(see $ 4.5) with parameters а = n,/2, В = п2/2, and its density function is 


(8.15.4) f(x) = x*^ (1 + x)7*7? B(x, В) 


with x = n,F/ns. 
The density function for F is given by 


g(F) dF = g(F) = dx = f(x) dx 
1 


n, yi 
n; poun-ai 


n, n үүт tear? 
a(™, 2) (1 At) 

G 2 $ nz 
The numbers n, and n, are called the degrees of freedom for F. This is а 
positively skew distribution. The mode (the value of F corresponding to a 


so that 


(815.5 (Е) = 0x F «oo 


Л à —2 
maximum of g(F) is at F = 2.52 which is always less than 1. The 
n, m + 2 
expectation of F is given by 
(8.15.6) E(F) = п,[(п, — 2), п > 2 


which is independent of n, and is always greater than 1. 
The distribution of Fisher's 2 is found by writing F = е2, dF = 2F dz, and 
its density function is 


2n,2g, m2 en 
(8.15.7) f)- „(ч m) (ne? + n,n 
2e 


8.16 Tables of the Distributions of F and z. Table B.5 in the Appendix gives 
for various values of п, and n, the upper 5% and the upper 1% points of the 
distribution of F. A complete table of the probability integral of F would be 
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quite bulky, being a triple-entry table, but since we usually merely want to know 
whether an observed F value is significant or not, it is sufficient to have, for each 
pair of values of п, and nz, a, few values of F corresponding to common levels of 
significance. In the tables of Fisher and Yates [7], 20%, 10%, 5%, 1% and 
0.1% points are given for both F and 2. 

Interpolation for n, and nz, when necessary, should be harmonic instead of 
linear. Thus, suppose the 1% point is required for n, = 60 and n; = 55. The 
table gives the 1% points for 50 and 55 and for 75 and 55 as 1.90 and 1.82, 
respectively. If x is the value for 60 and 55, harmonic interpolation gives 
a Же 1/30 i = ie so that x — 1.86. 

90 — 1.82 1/50 — 1/75 

EXAMPLE 6 For two samples, of sizes 8 and 12, the observed variances are 
0.064 and 0.024, respectively. Since the table refers only to the upper points of 
the distribution, we will take the subscript 1 to refer to the sample with the larger 


: 0.064 y 
variance. Then m, = 7, m = ll, and F = 0024 = 2.67, with 7 and 11 degrees 


of freedom. 
The 5% point is 3.01 and the 1 9/ point is 4.88. The probability of a value of 
F at least as great as 2.67 is therefore more than 0.05, and the two samples are 


Dot significantly different at this level. From the Fisher and Yates tables we note 


that the 10% point is 2.34, so that the difference is significant at the 10% level. 


Fic. 42 THE F-DISTRIBUTION 


A departure from the null hypothesis of equality of variances could just as 

well give a value of F less than 1 as a value greater than 1. Corresponding to any 
Е 

F, such that if = (Е) dF = а there is а value of F, such that J о 9(Е) ДЕ = а 

F2 


(see Figure 42, where it is assumed that а < 0.5). 
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If we put и = 1/F in the equation 


(8.16.1) “= g(F) dF 
Fi 
we obtain 
п; п›/2 
(8.16.2) «=| ы ___ Ше ду 


о в(2. 5) [и 
2 2 ny 


The integrand here is the same as g(u), in Eq. (8.15.5) but with л, and n; inter- 
changed, and 1/F, is the same as F,. Therefore to find the lower 5% point, say, 
for a given n, and n, we take the reciprocal of the upper 5% point, after inter- 
changing n, and n;. This makes it unnecessary to have tables for both ends of the 
distribution. It should be noted, however, that when we use a two-tailed test 
(supposing that the departures from equality of variance may be in either 
direction) the probabilities given in the table must be doubled. The 5% point 
becomes а 10% point, for example. 

-x 


1 
If we write F = 2- 
n 


(8.16.3) x 2njyn;-n,F)! 


it is a straightforward matter to show that x is a beta-variate with parameters 
n5[2, п1[2. The distribution function of x is therefore an incomplete beta function 


and the tables of this function can be used to calculate the probability that x is less 
11 
than some opserved value. Thus in Example 6, we should have x — Tl + 1F 


117 
= 0.371. The probability of a value not greater than this is „5. 3) = 0.071, 


, Or, equivalently, 


which, as previously noted, is greater than 0.05 but less than 0.1. 


* 8.17 The Power of the F-Test As in §8.15, we assume that the null 
hypothesis Но is that с? = o5? (both populations being normal). Let the 
alternative hypothesis H, be that c,?/c;? = 4, which we may take greater than 
1. Then Но is rejected if F > Е, where 


(8.17.1) [o dF =a 


Р, 
The power of this test is the probability that F > F;, under Hi, 


(8.17.2) P = Pr(F > F,|H,) 
512/61? : TR RS 
Now the statistic Fo;?/o,? = PRU on hypothesis Н, has the F distribution 
2 [62 


with n, and n; d.f. This follows because (n,/n)-(Fo27/o,”) is the ratio of two 
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"3 variates with п, and n, d.f., respectively. If, therefore, Fp is a value of F which 
15 exceeded with probability P, under Ho, 
К ie 
(8.17.3) p= Pr iP» ry.) 
c 


From Eqs. (2) and (3), Р, = 617 p/o2” = АЕ. 

We can use tables of F (preferably those by Merrington and Thompson, see 
[7]) to calculate A for selected values of P. For example if N, = N, = 10, 
а = 0.05 and P = 1 — В = 0.5, we find F, = 3.18 and Fp = 1.00, so that 
À = 3.18. This means that we have an even chance of recognizing а difference 
between the variances when one is 3.18 times the other if we agree to accept a 5 % 
chance of rejecting the null hypothesis when the variances are really equal. 

The tables can also be used to estimate the size of sample necessary to have a 
Biven chance of observing a stated difference in variance. Thus suppose that a 
Suggested new process of manufacturing some metal part might be expected to 
reduce the standard deviation of tensile strength by a factor of 1.41 (which 
means halving the variance). We would need, in order to have an even chance 
of detecting such an effect, samples of size 25, and to have a 95% chance we 
Would need samples of nearly 100, the rejection error remaining at 5%. 


* 8.18 The Variance of Sample Skewness and Kurtosis It is possible by 
means of long and rather tedious algebra to work out the moments of Ёз, k4, etc. 
in samples from a population with known cumulants. Even for a normal parent 
Population the expressions are long, for any moments above the second. Here 
We shall simply state a few results for the normal case. 
For the third k-statistic, 

E(k4) 20 
(8.18.1) i N 

V(k3) = 6K2 (N-IN-2 


and an unbiased estimator of this variance is 


- вым) 
18.2) P(ks) =N — 2 (N + 1XN + 3) 
For the fourth k-statistic, 
E(k4) =0 
(8.18.3) 24k,*N(N — 1)? 


oe ee T мие 
РЫ) = ty — зу — 205 + 3А +5) 
fg; = К/К»? (the sample skewness and 


The vari fg, = kalka and o 
ее = A y R. A. Fisher, who found that 


kurtosis respectively) were worked out b 
6N(N — 1) 

8. М: 3) 

(8.18.4) V(gi) (N — 2XN + JN + 3) 
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24N(N — 1)? 

(№ — 3)(N — 2)(N + 3(N + 5) 

For large N these approximate to the values 6/N and 24/N, respectively. 


(8.18.5) Иа) = 


8.19 The Distribution of Extreme Values It is sometimes convenient to 
judge a sample by the largest (or smallest) item in it. If the observed values of X 
for the sample are arranged in ascending order, 


Xy S Xz <... <хн 


and if F(x) is the distribution function for the parent population, the probability 
that xy < x is F(x)". This is true because if Xy < x, the same inequality must 
hold for all the other values in the sample, which are all supposed to be indepen- 
dent. The probability that the largest item in a sample of size N from a standard 
normal population is less than x is given by 


(8.19.1) P(xy < x) = [Ф(х)]* 


where 


Ф(х) =e» | e^" du 
Values of the lower and upper percentage points of the largest value xy have 
been calculated by Tippett and Pearson [8]. The same table applies to the 
smallest value x, with a change of sign and a reversal of the terms “иррег” and 
“lower.” A brief extract from this table is appended. 


TABLE 8.3 


Sample Size | Upper-Percentage Points 


N 5% 1% 
5 2.319 2.877 

10 2.568 3.089 
15 2.705 3.207 
20 2.799 3.289 
30 2.929 3.402 
50 3.082 3.539 
100 3.283 3.718 
1000 3.884 4.264 


Such а table is useful in some types of quality control problems. If, for example, 
a manufacturer is producing a certain article for which the average breaking 
strength should be 180 Ib with a standard deviation of not more than 12 Ib, and 
if routine samples of size 10 are tested, the Jowest value in a sample should not be 
below 180 — 12(3.089) — 142.9 Ib more than once in 100 times. If such a low 
value is observed it might be worth while to look into possible causes. 
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* 8.20 The Rejection of Extreme Observations The question of whether to 
reject an extreme value (often called a *straggler") in a set of observations is one 
that sometimes poses a difficuity in experimental work. If we can assume that the 
sample comes from a normal population of known mean and variance, the 
distribution of the extreme value as given in $ 8.19 will enable us to calculate 
the risk we run in rejecting the straggler. In practice, however, the mean 
and variance are not generally known and we must substitute estimates derived 
from the sample itself, but the distribution is then not precisely that of § 8.19. 

For a sample of size N with mean X and standard deviation s, the distribution 
of 


(8.20.1) u; = (x; — X)ls 


Where x; is the value of X for the straggler suspected, was worked out by W. R. 
Thompson [9]. He found that the quantity 
1/2 


N-2 
8.20. ПТ £A 
(8.20.2) t= ү ту F 
iu 
N 


has the Student-t distribution with N — 2 d.f., so that the probability of such a 
value arising by chance in a normal parent population can be calculated. This, 


however, refers to a single observation and not to the smallest or largest in a 


sample, and care should therefore be used in interpretation. In a sample of 
ld expect that one, by pure chance, 


20 from the same normal population we cou 
would reach a t-value corresponding to probability 0.05. As a rough rule-of- 
thumb, one might agree to require a probability of less than 0.01 for a sample 
of size less than 10 and a probability of less than 0.005 for one of size 10 to 20, 
before rejecting the extreme value. : Ж 
W. J. Dixon [10] has suggested the use of a simple ratio criterion for the 
Tejection of xy, and this requires Very little computation. For samples of size 
to 12 we compute га! = (Xv — xy- Gs — X2» and for larger samples 
"22 = (xy — xy) (Xu — Хз). If the ratio exceeds a critical value R,, the 
Probability is less than o that the extreme value xy comes from the same normal 
Population as the rest of the observations. When the Jowest value in the set is the 
One suspected, the observations should be placed in reverse order so that xy is 


TABLE 8.4 
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still the straggler. Table 8.4 is a brief extract from the tables [11] giving 
values of R, for « = 0.05 and 0.01, and N from 8 to 16. If r,, (or r22) is greater 
than Ro.os, the observation xy may reasonably b2 rejected. This table applies 
when we do not know before seeing the data whether we shall want to test the 
highest or the lowest value, and so corresponds to « = 0.025 and 0.005 in 
Dixon’s tables. 


EXAMPLE 7 Тһе following 15 observations were made of the vertical semi- 
diameter of the planet Venus in seconds of arc (43.00" have been subtracted from 
each reading and the readings have been rearranged in ascending order): 


— 1.40, —0:44, —0.30, —0.24, —0.22, —0.13, — 0.05, 
0.06, 0.10, 0.18, 0.20, 0.39, 0.48, 0.63, 1.01 


The observation — 1.40 is rather suspiciously low. The mean of all the readings 
is 0.018 and the standard deviation is 0.551 so that, from Table 8.3, the lowest 
value in the sample should not be below 0.018 — 2.705 (0.551) = — 1.47 more 
than once in 20 times, if the sample mean and variance apply to the population. 
The observed — 1.40 is therefore not too unreasonable, on this supposition. 
According to Thompson's criterion, и; = —2.574 and t = — 3.55, with 13 d.f., 
and the probability of a single value as low as this is less than 0.005. We 
might, on this criterion, reasonably reject the straggler. If we do so, and then 
test the remaining 14 observations, the largest has a t of 2.874 with 12 d.f. Since 
the probability is between 0.05 and 0.01, we should be chary of rejecting it. 

Applying Dixon's criterion, we get ғ, = 0.585, which is greater than the 
value 0.565 in the table, and so would lead to rejection. After rejecting the 
lowest value, the ratio for the highest remaining value is 0.424, and this suggests 
retention of the straggler. 


8.21 The Distribution of the Range The range of a sample, with the observed 
values placed in ascending order of size, is given by 


(8.21.1) К = ху х; 


If F(x) is the distribution function for the parent population, the probability 
that N — 2 values lie between x, and xy is [F(xy) — F(x,)]¥~?. The probability 
that one specified observation has the value x, (to x, + dx,) is f(x) dx, and 
similarly the probability for one specified observation to be equal to xy is 
S (xy) Яху. Since, however, there are N(N — 1) ways in which these two extreme 
observations may appear in the original order of the observations (x, could be 
in any one of N places and xy in any of the remaining N — 1 places), the 
probability of a sample with lowest value x, and highest value xy is 


(8.212) Р(х, xy) = ММ — D[F(xy) — FGc)] 7F (x1) f (xy) dx, dxy 
Putting xy = x, + R, we find as the joint probability density for x, and К 
(8.21.3) f(x1, В) = ММ — D[F(x; + В) — F&D]? (f(x + В) 
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Integrating over all values of x,, we obtain the probability density for К, 
namely, 


(8.21.4) (К) = N(N — vf [FG + В) — Fx] ?f(fG + В) dx, 
The distribution function for R is 
R 
(8.21.5) G(R) -| g(u) du. 
о 


If we write F(x, + и) — F(xi) = y, then, for a fixed x,, dy = (х; + и) du. 
Substituting from Eq. (4) in Eq. (5) and reversing the order of integration 
(which is legitimate here), we obtain 


© и=К 
(8.21.6) G(R) = N(N — | уо | y“? dy dx, 
-o u=0 
© 
=n |? чїй +D- FED ds 
-0 
has been calculated by Tippett (see, e.g., 


The expected value of R, E(R), 
[12], page 338) as 


(8.21.7) E(R) =|" []- F" - (1 — DN] dx 


where F stands for F(x). For a standard normal parent population (и = 0, 
в = 1), F(x) = Ф(х) = (ny? i e^? du. If the range in a sample from 


this population is denoted Бу w, Tippett's values of E(w) for М = 2(1)500(10) 
oe ^ cians, Vol. I, Table 27. Examin- 


1000 are given in Biometrika Tables for Statisti | 
ation of this table shows that for N between 350 and 550 the value of E(w) is 
Close to 6. This is the reason for the common practice of estimating roughly the 
Standard deviation from a sample of several hundred items as one-sixth of the 
гапре. 

Values of G(w) for the standard normal population have been calculated for 

= 2 to 20 by E. S. Pearson and H. O. Hartley and may be found in the 
Biometrika Tables, Table 23. More complete tables are given in reference [13]. 


he expression for G(w) may be reduced to 


(8218) с) = [S _ | —2N F pw = Фи — w)]^ ^! é(u) du 


but the evaluation must be carried out by numerical methods. The distribution 


does not а i N increases. 
pproach normality as № т . 
In practice the range is used mainly for small samples, of size 5 or 10, say, 


Such as commonly occur in the applications of quality control in industry. The 
Tange is certainly a very convenient measure of dispersion because of the 
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simplicity of its calculation. For large samples the distribution becomes very 
sensitive to departures from normality in the parent population, particularly as 
regards kurtosis. 


* 8.22 Tests of Hypotheses concerning the Variance of a Normal Population by 

use of the Sample Range By comparing the observed range with the tabu- 

lated value of E(w) for a standard normal population, we may estimate the 

standard deviation of the actual normal population. For samples of size <10 

the efficiency of this method is at least 85%, and the calculation is very easy. 
If the observed range is R, an unbiased estimator of c is given by 


(8.22.1) 6 = R[E(w) = КВ 


The values of k for a few sample sizes are given in the following table, which 
also includes upper and lower 0.5 percentage points for w = А/о. These may be 
used in establishing 99% confidence limits for с based on the range of a single 
sample. 


TABLE 8.54 
N k Lower 0.575 | Upper 0.5% 
2 0.886 0.01 3.97 
3 .591 413 4.42 
4 .486 .34 4.69 
5 .430 .55 4.89 
6 .395 #5 5.03 
7 .370 .92 5.15 
8 .351 1.08 5.26 
9 337 1.21 5.34 
10 .325 1.33 5.42 
15 .288 1.80 5.70 
20 .268 2.12 5.89 


*Extracted from Table 22, reference [8], by kind permission of Professor E. S. 
Pearson and the publishers of Biometrika. 

Thus if the observed range in a sample of five items is R — 8, the estimate of 
c would be 8(0.430) = 3.44. The 99% confidence limits for с would be 8/(4.89) 
and 8/(0.55), that is, 1.64 and 14.5. 

If we wish to test the hypothesis Но (that с = 1) against the alternative Н, 
(that c > 1) and if we use a test of size а, the critical value of w will be the upper 
100 х % point. Thus for æ = 0.005, the critical value for a sample of size 10 is 
5.42 (see Table 8.5). If « — 0.05 we find from a larger table that the critical 
value is 4.47. The following sample of random normal numbers, 


— 2.015, —0.623, — 0.699, 0.481, — 0.586, 
— 0.579, —0.120, 0.191, 0.071, —3.001 


has a range и = 3.482, and hence the hypothesis Но would not be rejected by 
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this test. The test is less powerful than that based on the sample variance (using 
a table of chi-square) but is easier to apply. 


* 8.23 The Range in Samples from a Rectangular Distribution If the parent 
population is rectangular, so that (with suitable units) F(x) = x, 0 < x < 1, the 
distribution function for the range can easily be obtained explicitly. We have 


(8.23.1) G(R) = м [trs + В) — Ех," * dx, 
0 


Now F(x, + R) = x, + Ras long as x, + R < 1, but, when x, > 1 В, 
F(x, + В) remains equal to 1. The region of integration for a given R must 
therefore be split into two parts, from 0 to 1 — Rand from 1 — В to 1 (see 


Figure 43). Then, 


=R 0 1-R 1 


Fic. 43 DISTRIBUTION FUNCTION FOR RECTANGULAR DISTRIBUTION 


1-R 
(8.23.2) G(R) = vf (xy +R- x) ^! dx, 
о 


1 
4 nf Q — x)" dx 
1-R 
мца — В) + RY = NRI! — (N — DR" 
The probability density is 
(8.23.3) g(R) = N(N — DR" *( = R) 


Which has a maximum at R = (N — 2/0 - 1). The expected value of R is 


W- DN + 1). | LN 
A rectangular population is not quite as artificial as it may appear. In the 
Production of machine parts in a factory to rather narrow specification limits, 


When only those articles which comply with the specification are included 
in the population, the hypothesis of а rectangular distribution seems not 
Unreasonable. 


ctangular Populations, Based on the Range 


* R 
8.24 A ity of Two Ke 
Test fer Hir dom samples of sizes N, and N3, assumed 


If R, and R, are the ranges in two ran 
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to come from rectangular populations of widths C, and C,, respectively, the 
distribution of the quotient of ranges A,/R;, under the null hypothesis that 
C, = C; = C, has been worked out by Rider [14]. 

The probability-density for U = R,/R, turns out to be independent of C and 
is given by 
[N NN; = 1)(N2 — 1)] x 

ГО, + N3)u"7? - (N, + №, — гуш] 
(Ni + NaN; + N5 — 00, + М, — 2) 


ifO<us<l 


(8.24.1) Ли) = 


and 
[N NAN; — D(N; — 1)] x 
ГОУ, + Nu — (Ny + М, — 2u7:71] 
(Ni + NaN; + М, — D(N, +N, — 2) 
The expected value of и is (N, — ПМ, (М, + 1)(N2 — 2)]. 


Rider gives a table for the quotient of. ranges which will be exceeded in 5% 
of random samples, and this table may be used for testing the null hypothesis. 


2 ifl <us<so. 


(8.24.2) f(u) = 


EXAMPLE 8 The width of a slot in a certain airplane part was measured to 
the thousandth of an inch in a sample of five parts on the first day of production 
and again in a sample of 10 parts two days later. The results (in thousandths of 
an inch in excess of 0.800 in.) were 


(1) 77, 80, 78, 72, 78 
(2) 75, 77, 75, 76, 77, 79, 75, 78, 77, 76 


We see that №, = 5, К, = 8, М, = 10, R, = 4. Then u = 2. The probability 
of a value as great as this can be obtained by integrating Eq. (2) from и = 2 
to co and is 0.0013. It appears, therefore, that the second sample is pretty 
definitely more uniform than the first. The 5% critical value of u is actually 1.27. 

This test is analogous to the F-test, discussed in $88.15 and 8.16. The null 
hypothesis that the quotient of population ranges is 1 is tested against the 
alternative hypothesis that the quotient is greater than 1. We can always make 
и > 1 by choosing for sample (1) that with the greater range. 


8.25 The Distribution of Order Statistics The ;'^ order-statistic of a sample 
of size N is the r smallest variate-value in the sample. If the values are arranged 
in ascending order of size, x, < x, < ху... < xy, the r'" order-statistic is x,- 
For a sample of size 2r + 1, the (r + 1)'^ order-statistic is the median (the 
middle value). 

In a sample of size N from a population with a continuous distribution 
function F(x), the probability that x, — x (to x 4- dx) is 


(8.25.1) g(x) ах = C[FGO] Ч — FG9]" "f ах 


since there are г — 1 observations smaller than x and N — г observations 
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larger than x. The constant C is found from the condition 


(8.25.2) s | ” mod 


-© 
Writing F(x) = F, f(x) dx = dF, and noting that F goes from 0 to 1 as x goes 
from — со to œ, we see that Eq. (2) gives 


1 

(8.25.3) c F^ — В)" dF =1 
0 

whence 

(8.25.4) C -BrN-reD 


For a rectangular parent population, F(x) = xand g(x) is simply the ordinary 


beta-distribution. 

For a normal population, th 
calculation of certain integrals. 
with F(x) = Ф(х), we have 


e study of the distribution involves the numerical 
Thus for the median, with N = 2r + 1, and 


(8.25.5) g(x) dx = [ФО] [1 — 969] 496) 
Where 
(8.25.6) C^! 2B(rd r1) DN! 

since the median is х,+1 


Note that the r of Eq. (1) is now replaced by r + 1, 
When N = 2r + 1. For М = 2r, the median is taken as (x, + x,41)/2- 

It is obvious from the symmetry of the parent population (which we have 
taken as standardized) that the expected value of the median will be zero. The 


Variance is 
© 
(8.25.7) И(х,+ 1) -| x? g(x) dx 
_ ЗМ | х2ф(1 — ФУФС dx 
(r!)? J -o 
1/257 2/2. and Ф is written for Ф(х). If (1 — Ф)" is expanded 


Where ф(х) = Qz)- 
binomially as 


Integration by parts, with x e" *'? as one part, yields the result 


(255) vo Qe gem EO еле от 
Where 


© rti 39x) 1? exp( 3х2/2) dx 


(8.25.10) Tus -| 
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By calculating the different integrals of this type, the variance, and also 
higher moments, may be obtained. An approximate expression for the variance 
of the median in odd samples from a standard normal population is 


л л? 


XN 42) AN +200 +4) 


(8.25.11) И(х, +1) 


For а sample of size 11 this gives 0.133, the correct value being 0.137. 

The corresponding variance of the mean is 0.091, so that the efficiency of the 
median in a sample of this size is 66.4%. As N increases, the efficiency tends to 
the value 2/л = 0.637. On the average, therefore, we can get about as good a 
value of the population mean from the median of 100 observations as from the 
mean of 64. 

If д is the population median, for a population with density f(x), so that 


(8.25.12) Г f(x) dx =1/2 


then аз № — oo the sample median is approximately normally distributed with 
mean ji and variance [4(N + Jf (A +. 

For the rectangular distribution, for which /(Д) = 1, this value is exact. For 
the normal distribution, /(/i) = (22)~'/?, and the approximation is 2/[2(N + 2)]. 


* 8.26 The Asymptotic Distribution of the Extreme Value The distribution 
of the largest value in a sample is of interest in certain applications, as for 
instance in predicting the occurrence of exceptional floods in river flow. If in 
Eq. (8.25.1) we put r = N, the probability density for the largest value is found, 
as in § 8.19, to be 


(8.26.1) g(x) = МЕО)" (х) dx 
and the distribution function is 

(8.26.2) G(x) = [F(x)]” 

Let хо be defined by the relation 

(8.26.3) N[1 — F(x,)] =1 


Since МП — F(xg)] is the expected number of values exceeding xo in a 
sample of size М, Eq. (3) states that in such a sample we may expect хо to be 
exceeded just once. 

We will first suppose that the distribution in the parent population is 
exponential, so that 


F(x) =1—=е`“*,  f(x-2ae ", | ande?" = № 
Therefore, 


N 
[FG9]" =. [1 эй XE = E ck E eem 
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and the limit of this as N > œ is exp[ -e ^9 7?]. We have as the limiting 
distribution, therefore, 


(8.26.4) lim G(x) = exp[ e **7??] 
and, consequently, 
(8.26.5) lim g(x) = хе” 2-7 exp[ -e 9779] 


If y = a(x — ху) the density function for y is 
(8.26.6) Һу) = e exp( e?) 


Which is a form used by Gumbel in a study of floods [15]. 
For the normal distribution, 


f(x) = От) e? 
and 


© 


(8.26.7) 1 — Е(хо) = e» | e^??? du 
хо 
The integral on the right of Eq. (7) is asymptotically equivalent to 
Ло) Их — 1x! +... ), so that from Eq. (3) 
1 

1 -12 -ama e. J 

N ian: i Xo хо? 
ог 

— 4xo? = log xo + $ 108(2л) — log N 


Using only the leading terms for large N, 
(8.26.8) хо? = 2108 N 


In the exponential distribution of Eq. (4), а= [1 А (хо о). Tis 
Corresponding expression for the normal law is asymptotically equal to Xo, 
and in fact, as proved by Fisher and Tippett [16] the limiting distribution 
of the extreme value is the same as that of Eq. (4) with æ = хо, where xo is 
Biven by (8). 


8.27 The Effect of Non-Normality Since we do not usually know whether a 


sample comes from a normal universe or not, it is natural to ask what difference 
i 1 if the universe were not normal. Bartlett 


s quite good results even for con- 


i i -tai t is more 
Siderable departures from normality, although the one-tailed tes < 
Vulnerable in this respect than the two-tailed test. For a skew parent population 
the true significance level may be considerably under- or overestimated by using 
the Ordinary tables. The effects of skewness and kurtosis in the parent population 
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on the power function of the t-test have been discussed by Srivastava [18]. А 
positive skewness tends to reduce the power when the power is low and increase 
it when it is high; a negative skewness has the opposite effect. Kurtosis, unless 
quite marked, seems to have comparatively little effect. With increase in sample 
size, of course, the effect of non-normality diminishes. 

Some experimental work on the F-test for samples from non-normal popu- 
lations was carried out by E. S. Pearson, and this suggested that the test may be 
used with populC'ions differing quite considerably from normal without serious 
error. W. G. Cochran [19] has discussed the effect of non-normality on the t-test 
and F-test, and concludes that a tabular 5% may perhaps meananything from 4% 
to 7% and a tabular 1% anything between 4% and 2%. The effect of non- 
normality is usually to increase the apparent significance of results, which 
suggests caution when interpreting results near the borderline of significance. 

Unless data are very extensive, it is seldom possible to demonstrate that they 
are not normal. The standard errors of skewness and kurtosis are so large with 
samples of moderate size that only very marked non-normality could be detected. 
If there is reason to suspect non-normality, from the nature of the data, it is 
advisable to try a transformation. The logarithm of the variate, or the square 
root, or the inverse sine, may be more nearly normal (see Chapters 3 and 4, 
§§ 3.15, 3.16 and 4.8, and also reference [20]). 


PROBLEMS 


A. (88 8.1-8.4) 

1. If X is normally distributed with mean 0 and variance 1, and if m is the mean of 
a random sample of 16 items, show that the odds are about 370 to 1 against obtaining 
an m numerically greater than 1. 

2. Assume that the mean age at death of men who are alive at age 20 is 59.13 years, 
with a standard deviation of 10.2 years. An insurance company would like to feel 
fairly sure (probability at least 0.99) that the mean age at death in its own group of 
men aged 20 will not differ from 59.13 years by more than | year. Assuming a normal 
distribution, how large should the group be? 

3. The mean of a particular normal distribution is equal to the standard deviation 
of the mean of samples of 100 from the same distribution. Find the probability that the 
mean of a sample of 25 will be negative. 

4. How large a sample should be taken from a normal population if the probability 
is to be 0.95 that the sample mean will not differ from the population mean by more 
than one-quarter of the population standard deviation? 

5. Prove that the density function of the statistic k2, for given ø, has a maximum at 
Ка = (n — 2)o?/n, where п = N — 1. (Hence the estimator nks/(m — 2) has the 
property that its most likely value is the true value о?.) 

6. Show that the moment generating function of the distribution of ke is M(/) = 
(1 — 2ho?[n)-"/?, Hence obtain the c.g.f., and the expectation and variance of ke. : 

7. For what value of « is the expectation of («Ёз — o?)? a minimum? (The quantity 
ake is а “least squares" estimator of o?, different from the unbiased estimator and the 


one in Problem 5). 


EXACT TESTS ON SAMPLES FROM А NORMAL POPULATION 209 


8. If s is the positive square root of k2, prove that 


вө") 
rom a=) 


and 


Hence show that 


and 


o? d ou 
vo- (S)( -z o2) 


Hint: Use the Stirling formula, 


log I^ = bi 2т + cad logx — x +752 
og Г(х) = 5 log 5 12x 


to evaluate log ГО) Зее (8.6.4). 


B. (88 8.5-8.13) ' 
1. Four different boxes of Eddy's matches, from the same carton, contained 55, 58, 
53 and 57 matches. Obtain 95% confidence limits for the mean number of matches in 


Miis of the same kind. ds, of tain type of cable was measured for 
. The tensile strength (X), in pounds, of a certat а 

12 samples. The estil were: 182. 178, 185, 184, 180, 179, 177, 185, 174, 179, 183, 186. 
Calculate 90% confidence limits for the mean of X in this type of cable. Hint: Use the 


auxiliary variate U = X — 180. | 
3. A machine producing mica insulating washers is supposed to turn them out with 

а mean thickness of 10 mils (1 mil = 0.001 in). A random sample of nine washers 
from the output of this machine has a mean thickness 9.5 mils күн а чап ш varon 
0.60 mil ДО у different from standard with respect to thickness? 
il. Is the output significantly ns at a certain site, 16 lower first 


molars 1 th 13.57 mm and standard deviation 0.72 mm. From 
were found with mean leng re taken with mean 13.06 mm and standard 


Population? є А 
5. Two samples of herring were measured for length (mm) with the following 
тези: 
1 178 
(1) 192, 179, 181, 193, 215, 181, 178, 185, 160 


(2) 173, 194, 194, 187, 168, 186, 176, 191, 191, 17 
Find 95% confidence limits for the difference in t 


Populati e я РЕ " 
6. м fed on diet A, fifteen others on diet B. The gains in weight 
for the individual hogs in pounds, over the same period, were as follows: 
A 25, 30, 28, 34, 24, 35, 13, 32, 24, 30, 31, 35. 5 
B 44, 34, 22, 18, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22. 


he mean lengths for the two 
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On the assumption that the diet may affect the mean gain without affecting the 
variance of gains, obtain 90% confidence limits for the average increase in gain with 
diet.B over that with diet A. 

7. А physiological experiment was carried out to test the effect of an injection of 
secretin on the percentage of reticulocytes in the blood of rabbits. Seventeen rabbits 
were tested, before and after injection, and the mean increase was 0.0635. The standard 
deviation of the increases was 0.168. Was there a significant effect at the 5 % level? 

8. A paired feeding experiment on pigs was conducted to determine the relative 
value of limestone and bone meal for bone development. The variable is the percentage 
ash content in the shoulder-blade. 


Pair Number | Limestone | Bone Meal 


49.2 37:5 


1 

2 53.3 54.9 
3 50.6 52.2 
4 52.0 53.3 
5 46.8 57.6 
6 50.5 54.1 
7 52.1 54.2 
8 53.0 54.3 


Is the difference between the two sets of values significant at the 5% level? 

9. (Snedecor) An agronomist, interested in the effect of superphosphate on the yield 
of corn, added the fertilizer to a mixture of manure and lime. Five pairs of adjacent 
plots were used for the trial, the plots in each pair being as alike as possible except that 
one was treated with the old fertilizer (without superphosphate) and the other with the 
new. The plots with the new fertilizer yielded, respectively, 20, 6, 4, 3 and 2 bushels 
per acre more than the corresponding controls. Was the value of the superphosphate 
demonstrated ? А 

If the increased yields had been 5, 6, 4, 3 and 2 bushels per acre, would the verdict 
have been different? Explain the apparent paradox. 

10. Complete the proof of the statement in $ 8.5 that Student's r-distribution tends 
to normal as n — co. Hint: See Eq. (8.6.4). 

11. If t = n cot $, show that the density function for $ is С sin^-!d(0 < $ < т), 
where C = 1/B(1/2, n/2). 

12. A sample of size 20 is used for testing the hypothesis that р = 0 against the 
alternative hypothesis that р = 0.50, where о is the population standard deviation. 
If Student's ¢ is used as the criterion, and the size of the test is 0.05, what is the power? 
Hint: Use the approximation of Eq. (8.12.5). e 

13. It is desired to test the hypothesis that ш = 0 against the alternative hypothesis 
that ш = pa(pi > 0). If the standard deviation of the population is 10 units and а 
sample of size 17 is used, find the least value of мл that could be detected by the t-test, 
assuming that the risks for both kinds of error are not more than 0.05. 

14. Two samples, each of size 10, come from populations with means pı and p2 and 
а common variance c?. The null hypothesis Ho is that иә — ил < 0 and the alternative 
Hi is that pe — pı = ko(k > 0). Show that if о is not greater than 0.05, a value of k 
at least 1.37 could be detected by the ;-test, with power at least 0.9. Hint: If the two 
samples have means zi and т» and variances si? and 52?, the quantity (те — ту) 
{(s12 + 527)/10} м? has {һе г distribution with 18 d.f. if pı = ро. Under Hi it has а 
non-central /-distribution with parameter k4/5, since the variance of mi — m» = с sj 5. 

15. A large lot of manufactured articles is rated on the percentage p of defective 
items, an item being reckoned defective if the value of a normal variate X is at least 3. 


EXACT TESTS ON SAMPLES FROM А NORMAL POPULATION 211 


A prospective purchaser will want to reject a lot with probability 0.95 if p > 2.5. A 
sample of 10 items is used for a non-central r-test. What criterion should be used for 
accepting the lot? Hint: Find 8 from Eq. (8.13.3); К can be obtained from the tables 
of non-central г, using Eq. (8.13.5) with P — 0.05, or approximately from Eq. (8.12.5) 
with zp = 1.645 and t, = kv/10. 


C (88 8.14-8.17) 

1. The variance of a random sample of size 5 is 29.83. Calculate 90% confidence 
limits for the population variance. 

2. Two chemists, А and B, each repeat a protein analysis 20 times. If the sets of 
values obtained are denoted by Xi, Yi, respectively, it is found that XX; = 196.40, 
XX; = 1928.6560, XY; = 205.16, XY? = 2104.7152. Determine whether there is a 
significant difference in precision between the two sets of analyses, precision being 
inversely proportional to the variance. . р a 

3. In two series of hauls to determine the number of plankton organisms inhabiting 
the waters of a lake, the following results were found: 

Series I: 80, 96, 102, 77, 97, 110, 99, 88, 103, 108 

Series II: 74, 122, 92, 81, 104, 92, 90. | Р 
In Series I the hauls were made in succession at the same place; іп Series П they were 
made at different points scattered over the lake. Does there appear to be a greater 
variability between different places than exists at different times at the same place? 

4. For the data on feeding of hogs in Problem B-6, determine whether the assump- 
tion of a common variance under both diets is justified. . 

5. If x = лә/(пә + mF) prove that x isa beta-variate with parameters 73/2, m/2. 

6. When m = 2, show that the upper significance level of F corresponding to 
Probability p is na(p-2/"2 — 1)/2. Hint: The integral of g(F) from Fi to co is p, where 
Fi is the required level. . ae 

7. Find the upper 5% point for F with 2 and 4 degrees of freedom by direct integra- 
tion of g(F). Compare with the value in Table B.5 in the appendix. 

8. What is the smallest ratio A of two variances (A > 1) that can be detected by an 
F-test with two samples of size 10, the size of the test being 0.05 and the power 0.95? 
Hint: When m = na, the 95% point for F is the reciprocal of the 5% point. 

9. An approximation to Fisher's z for given P, m and из has been devised by A. H. 


Carter, namely, 
(h + К)" и k 5 5) 
ZF = ZP h "m пә iu 6 3 


Where 5 = I/m + 1/из, h = 2/5, k = (2р? 3)/6 and z» is the normal standard variate 


exceeded wi ility P. Use this approximation for т = из = 19 to find zr, and 
hence Е Sed quond Then determine the smallest variance ratio detectable with 


" 1 
two samples of size 20 and a test of size 0.05 and power 0.25. Hint: ze = 5 loge F. 


D ($$ 8.18-8.26 a. 
т А sample 2t 10 observations is taken from a normal population with mean 250 and 


Standard deviation 10. What value for the largest member of the sample would be 
exceeded only once in 20 samples; what value only once in 100 samples ? 

2. A quantity is measured 10 times with the following results: 236, 251, 249, 252, 
248, 254, 246, 257, 243, 274. Should the largest of these observations be rejected accord- 
ing to "s criterion? й р 

E Robe measurements of an angle (degrees and minutes omitted, values 
in seconds of arc) would it be reasonable to reject the lowest reading? 51.75, 47.85, 
47.40, 48.90, 44.45, 48.45, 51.05, 48.85, 50.95, 50.60, 47.75, 49.20, 50.55. 
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4. Apply Dixon's criterion to the data of Problems 2 and 3. 
5. The following frequency distribution was found for the range R in 200 samples 
of size 10 from an artificial, approximately normal, population with mean 20 and 


standard deviation 4: 
R 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 


ў 2 4 4 ТЕ 11 20 25 28 25 17 13 19 5 9 $3 3 3 O0 1 


Calculate the mean range and estimate the standard deviation of the population, 
assuming that the true value 4 is unknown. Hint: The estimator of а is k R, where k 
is found from Table 8.5 

6. Let и denote the mid-range of a sample, that is, и = (xı + xw)/2. For the rect- 
angular population with density f(x) = 4, —1 < x < 1, the density for the mid-range 
is given by g(u) = 1N(1 — |и|)х-1. Prove that the variance of the mid-range is 
200% + ПМ + 2). Hence show that the mid-range is, for all N > 2, more efficient 
than the arithmetic mean as an estimator of the population mean. Hint: Separate the 
interval of integration into two parts, —1 to 0, and O to 1. 

7. A sample of size М is taken from the exponential population with density e77, 
x —0. Find the density function of the range and show that its expected value is 
+++... + ИМ — 1). Hint: In the integral for E(R) put u = 1 — е", and 
expand log (1 — и) in a series. 

8. Samples of size 4 are taken from the population of Problem 7. Find 9575 
confidence limits for the range in such samples. 

9. For a sample of size N from the rectangular distribution f(x) = 1/6, 0 <x < b, 
show that R/b is a beta-variate with parameters N — 1 and 2. Hence obtain the mean 
and variance of the distribution of R. 

10. Numbers are drawn at random from the interval (0,1). How many are required 
before the probability will exceed 0.95 that the range of the sample will be at least 0.5? 
Hint: Show that N is given by the inequality 2*-? > 5(N + 1). Solve by trial for small 
values of N. 

11. А заре of odd size N( = 2r + 1) is taken from the rectangular population 
with density 1, 0 < x < 1. The median is the (r + 1)" member of the sample when 
arranged in ascending order. Prove that the expectation of the median is $ and its 
variance is 1/(4N - 2). 

12. Prove that the density function for the range w in samples of size 3 from à 
standard normal population is g(w) = (3/z!/2)e-v*^[b(w[4/6) — 4]. Hint: Use Ва. 
(8.21.4) with F(x) = Ф(х), and w for К. Put xı = (v — w)/2 and obtain 


со (v-- w)/2 
g(w) = 3(22)- 3/2 e- w?/4 e-"i^ е-и?/2 du dv 
© (v—w)/2 


Change to oblique coordinates x = и — 0/2, z = оу/5/2 and integrate over the strip 
of horizontal width w, x going from —w/2 to w/2 and z from — œ to ». 
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Chapter 9 
ANALYSIS OF VARIANCE 


9.1 Tests of Homogeneity of Variance The analysis of variance is a widely 
used technique for separating the observed variance in a group of samples into 
portions which are traceable to different sources. Thus the different samples 
may have undergone different treatments which possibly affect the general level 
of the measured variate X. If all the samples are lumped together into one grand 
sample, the observed variance will be partly due to differences between the 
individual members of the same original sample and partly due to the effects 
of the different treatments. The method of analysis of variance enables us to 
estimate how much of the variance is attributable to the one cause and how 
much to the other, and so to decide whether or not the treatments have produced 
any significant effects. 

Much more elaborate experimental designs than this can be analysed by 
comparable methods, and several of the more usual designs will be considered 
in this chapter. АП the common analysis-of-variance tests rest on certain 
assumptions, such as normality of the distribution of X and additivity of treat- 
ment effects, and among these assumptions is one on the constancy of variance 
as between samples. It is supposed that, apart from possibly affecting the, 
average value of X, the different treatments (for instance) do not change the 
sample variances. Methods which do not depend on these assumptions will be 
mentioned later on, but meanwhile a test for the homogeneity of variance as 
between a group of samples will be considered. 

For two samples, the technique of $ 8.15 may be used. We suppose therefore 
that we have k samples (k > 2), and that for the і" sample, of size №, the 
observed values are xj, = 1, 2... Np i = 1, 2... k. We assume that all the 
samples are independent and come from normal populations with means Ji; 
ио... and variances 0:?, o,?...0,2?. The null hypothesis Но is that 
a,’ = 05? =... = 012 (= о?, say). The alternative hypothesis Н, is that these 
variances are not all equal. 

Under Но, the likelihood function is 


zm» [-35 625) ] 
E sex] уу [ЖЕЙ 
(9.1.1) о = лозу? exp| 5% ( = 
where N = У №. Under Hi, the likelihood function is 
1 Xi; — 5 ] 
613 — Lgs C 
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The joint maximum likelihood estimators of и; and с under Ho are b. 


1 - 

№; = м, » Xij — Xi 
(9.1.3) 1 S. 

A2 =\2 — i 

б ny eu х) = 
where 
(9.1.4) S; =) (ху X) 

J 


which is the sum of squares of deviations from the mean for the i" sample. The 
sample variance is v; = S;/n where n; = N; — 1. 

The joint maximum likelihood estimators of и; and c; under Н, are 

fi =X; 
(9.1.5) bs = SIN, 
The likelihood ratio is, therefore, 
S/N мм? . . (SN)? 
(9.1.6) (Lomax _ ( JN) a ЛАВ) zs 
(L1)max (S/N) 

where 5 = X S; Then Но will be rejected if L < c. The constant c is so 
chosen that 
(9.1.7) P(L < c|Ho) < а 

As mentioned in $ 6.9, the distribution of —2 108 L for large М is approxi- 


k — 1 degrees of freedom. This number is the number of 
2k, less the number under Но, namely, kl 


mately y? with 
parameters under H,, namely, 
In this case, 
S 1 S; 
(9.1.8) -2log L = М1овту— X Ni OB. 
It was shown by Bartlett [1], that the approximation to y? may be improved 
Бу. replacing each N; by n; (= Ni — 1) and therefore N by n (=N = k). In 
effect this replaces the maximum likelihood estimators by unbiased estimators. 
Furthermore, the approximation will hold reasonably well down to values of л; 
as small as 4 or 5 if a correcting factor js introduced in Eq. (8). We can there- 
fore in most cases assume that the quantity 
5 5 
(9.1.9) м=с-{| юв — у) 1ов 
п п; 
is distributed like y? with k — 1 d.f., where 
и 3 
91 та 
.1.10) eat 
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А small value of L will mean a large value of M, since М = —2 log Г, and 
this will lead to rejection of the null hypothesis. It should be noted that the 
logarithms in Eq. (9) are to base e. 

For even smaller values of п; (down to 2, say) an improved approximation 
was given by Hartley [2], and tables based on this approximation have been 
compiled by Catherine Thompson and Maxine Merrington [3]. 


EXAMPLE | To test the effect of a small proportion of coal in the sand 
used for making concrete, several batches were mixed under practically iden- 
tical conditions except for the variation in the percentage of coal. From each 
batch, four cylinders were made and tested for breaking strength in Ib/in?. 
One cylinder in the third sample was defective, so there were only three items 
in this sample. The results are given in Table 9.1. 


TABLE 9.1 

Sample No. 1 2 3 4 5 
Percentage coal 0 0.05 0.1 0.5 1.0 
Breaking strengths 1690 1550 1625 1725 1530 

1580 1445 1450 1550 1545 

1745 1645 1510 1430 1565 

1685 1545 1445 1520 
Mean 1675 1546 1528 1538 1540 
Silni 4750 6673 7908 18,475 383 
ni logio Si/ni 11.03 11.47 7.80 12.80 7.15 


From this table, we obtain the values n — 14, S/n = 7619, n log (S/n) — > т 
log (S;/n;) = 2.303 [54.35 — 50.85] = 8.06. Also Y (п) = 4/3 — 1/2 = 11/6, 
so that C = 1.15 and М = 8.06/1.15 = 7.02. With 4 d.f., this value of x? 
corresponds to a P of 0.13, so that even the rather large differences in the 
estimates of variance, given by S;/n; in the above table, are not really significant 
in view of the small sample sizes. 

Thompson and Merrington's tables should preferably be used for values of n; 
as small as those appearing in the above table. These give the 5% point for the 
distribution of —2 log L as about 10.7 and the 1% point as about 14.9. The 
observed value, 8.06, is therefore not significant at the 5% level. 


9.2 A Test for Difference of Means in k Samples The simplest application 
of analysis of variance occurs in the problem of deciding whether a group of 
samples come from populations which differ from one another in respect of 
their mean values of some measured variate X. It is assumed that they do not 
differ as regards the variance of X, and this homogeneity of variance may 
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be tested by the methods given in $9.1. An example is the measurement of 
breaking strength on the five samples of concrete cylinders in Table 9.1, in 
which the test of homogeneity has been shown to be satisfied. 

If ху is the measured value of X on the j'" member of the і" sample, j = 1, 
2...Ny i = 1, 2... k, the mathematical model we assume is 


(9.2.1) Xij — p 0 T &) 

where for every i and j, в; is normally distributed with expectation 0 and 
variance c?. This means that the measured x;; for any individual item is made 
up of three parts which are added together, an over-all average value denoted 
by и, an effect due to the particular treatment undergone by the i sample, 


denoted by а; (and supposed to be the same for all members of this sample), 


and a random or error term £;; due to many unspecified causes. These £;; are 


supposed to be uncorrelated with each other. 
Adding the x;; for all items in all k samples, we get 


(9.2.2) хи =Ми+ у, Ма + у ву 
J i А 


where N = У №. We can suppose that д and the о; are so adjusted that 
Ума, = 0. If this does not happen to be the case at first, and if У, N; а; = A, 
we simply have to subtract h/(k Nj) from each o; and add h/N to и. Then if x is 
the over-all mean of the x;; (= N^ Yu ху), we see that X is an unbiased estimator 


of д. The total sum of squares for all the x;; may be defined by 
(9.2.3) S,- A Qu; — xy 
bs 
=) ху -G 
ij 


where G = Nx? = (is xy) IN. Now 
xy Bey + = ® 


where x; is the mean of X for the i sample, and therefore 


(xy = 3)? = Qu — xy + (% — x)? + UX; — Xu; — X). 
m i 
The first expression for S, above can then be written 


(9.2.4) 5, -YXGu-3Y* 2 МК — x) 
+2 > [е — x) Py (xi; e 32] 
i J 


The last term in this equation vanishes since Ly Oy — х) = B Also 
У, (xi; — Xj is the sum of squares for the ху belonging to the i sample, 
Which we may denote by S, so that the first term on the right-hand side of 
Eq. (4) is У, S; This is generally called the sum of squares within samples, 
denoted by 5,. The remaining term depends on the means of the various 
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samples and their relation to the over-all mean. It is called the sum of squares 
between samples, denoted by S,. Then 
(9.2.5) S, = $„ +S, 
where 
(9.2.6) 5, = 2; (xi; — х)? 
tJ 
=) xy — DN? | 
^J i 
2 
Xi 
-7x -y (09) 
tJ Li М; 
апа 
(9.2.7) 8, = >$ мх, — x)? 
=>) Nx? — Nx? 
k 1 
2 
Xij 
=; È) 
' Ni-G 


It should be noted that the splitting up of the total sum of squares into the 
two parts S,, and S, is a matter of algebra and does not depend on any assump- 
tions about the normality of the distribution or the constancy of variance 
between samples. However, on the null hypothesis that all the ху come from 
а single normal population with variance о? (this is.equivalent to assuming 
that all the о; in Eq. (1) are separately zero), it follows that S,/c? is a y?-variate ! 
with N — 1 degrees of freedom. Similarly, within the i sample, S;/o? is а 
x^-variate with N; — 1 d.f., so that, by the addition theorem for independent 
7?-variates, 5„/о? is distributed like y? with YQ —1) = N — kd.f. Theorem 
4.3 then tells us that S,/o? is a y?-variate with N — 1 — (N — k) = k — 1 d.f., 
and is independent of S,,. 

Since the expectation of a y?-variate is equal to the number of degrees of 
freedom, it follows that 


E(S,/o) =N —1 
(9.2.8) E(S,,/o?) =N — k 
E(S,/o”) =k 1 
so that 
E[S, (N — 1)] = о? 
(9.2.9) Е[$ „(М — k)] = о? 
E[S,/(k — 1)] =07 
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Furthermore, the ratio of the two independent unbiased estimators of c?, 
namely S,/(k — 1) and S,/(N — k) has the F distribution with А — 1 and N — k 
degrees of freedom. These estimators are usually called “mean squares." The 
result is set out in an Analysis of Variance table, such as Table 9.2. 


TABLE 9.2 
Т Sum of Squares | Degrees of Freedom Mean Square 
Variation (S.S) (D.F) (M.S.) 
Between samples So k-1 So/(k — 1) 
Within samples Sw N—k Sw/(N — К) 
Total St N-1 Si/(N — 1) 


If the null hypothesis is not true, the c; will not all be zero, and the mean 
Square between samples will tend to be greater than the mean square within 
. Samples; The F-test will therefore be a one-tailed test, and the probabilities 
given in the table (Appendix B.5) are correct as stated there. 
As an illustration we may consider the data of Example 1, 8 9.1, in which 
К= 5 and all the N; are 4 (except №, which is 3). We find Y, xij? = 46,842,150, 
Ум ху = 29,780, whence С = 46, 676, 232, and 5, = 165, 918. Also the five 
values of (Y; xj)? are (6700), (6185), (4585), (6150)? and (6160)?, whence 
S, = 106,661. The value of Se by difference, is 59,257. The results of the 
analysis, set out in the form of Table 9.2, are as follows: 


TABLE 9.3 
Variance S.S. D.F. M.S. 
Between samples 59,257 4 14,814 
Within samples 106,661 14 7,619 
165,918 18 9,218 


Total 


1.94, with 4 and 14 d.f. Since the 5% point 
is 3.11 and the 1% point 5.03, it is clear that the observed value is not significant. 
We can therefore, as far as this test is concerned, accept the null hypothesis that 
the strength of the concrete was not áffected by the different amounts of coal in 


The vaiue of F is (14814)/(7619) = 


the sand used in making it. | 
When the К samples are all of the same size, say r, N — rk and У a; = 0. 


Eq. (7) becomes : 
1 
(9.2.10) S, =? (х х) -G 
t J 
tion (Complete Blocks) In a somewhat more com- 
the attempt is made to estimate two effects simul- 
bove of the concrete cylinders, we might have 


9.3 Two-Way Classifica 
plicated experimental design, 
taneously. In the example a 
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allowed the concrete to set for different periods of time before testing its strength, 
In a field experiment on the yield of a certain crop, under different treatments. 
we might wish to estimate the effect of different locations of the experimental 
plots. (There might for instance be appreciable effects due to differences of soil 
moisture, drainage, slope, shade, etc., as well as natural differences of soil 
fertility). The experimental procedure is then to set out the plots in distinct 
blocks, the same number of plots in each block, arranged so that as far as 
possible the plots in any one block are relatively similar. The experimental 
treatments are applied randomly to the plots within each block. Figure 44 


Fic. 44 COMPLETE RANDOMIZED BLOCKS 


suggests a possible arrangement in which five treatments are used in each block, 
and each treatment is replicated on two plots. This is an illustration of a 
“complete block design.” The purpose of randomization is to reduce as far as 
possible any systematic effects of the uncontrolled factors in the experiment, 
and to give increased justification for applying statistical theory. 

In general we will suppose that we have a treatments and b blocks, and that 
each treatment in each block is replicated r times. The total number of indi- 
vidual items (plots) in the experiment is N = abr. A variate X is measured on 
each item, and we will denote by x;;, the value of X for the і" treatment in the 
J'^ block, on the А" replicate. We suppose that the /" treatment has an effect 
on X measured by «;, and that the blocks also have their effect, the j'* block 
contributing В, to X. There may also be a differential effect of treatments іп 
different blocks, known as "interaction." (The і" treatment may not contribute 
the same arhount to X in each block.) Assuming that these various effects can 
be added together, we have as our mathematical model: 


(9.3.1) хи =H +A: +B; +7; + gk 


where д is the over-all average effect, y; is the interaction (the extra contribution 
of the i' treatment in the j'^ block over and above the general effect of this 
treatment), and у is the random effect shown by the k‘ replicate in the j 
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block under the і" treatment. This random part of ху is due to all the miscel- 
laneous causes which may produce an effect but which are not specifically 
allowed for in the design of the experiment. It is usually considered as experi- 


mental error. 
We can always adjust the origins from which «;, В; and y;; are measured, 


so as to satisfy the conditions: 


(9.32)  Ya-20 E=% Xw-9 j-L2..b 
j i 


Ууу =9, bs1,2..8 
jJ 


We may suppose also that E(e;j,) = 0 and Ид) = 07. 
The total sum of squares may be split into four constituent parts, namely, 


(9.3.3) S, = Sa + S, + Sa + 5, 
where 
(9.3.4) 5, = 2, Xp FS 
ij 
= №2 = №! Xii 2 
(9.3.5) С = №? = № p xin) 
(9.3.6) „= (D! " хил), = 
i Vk 
(9.3.7) 5, = (^ У (У, Xin) – 6 
j Mk 
(9.3.8) 5 У (У х)? -$-5-6 
ij VK 
(9.3.9) 5, = È Xi =r" У, " ха)? 
ijk ij \k 


The four terms on the right-hand side of Eq. (3) are, respectively, the sum of 
squares (S.S.) between treatments, the S.S. between blocks, the S.S. due to 
interaction and the S.S. between replicates. The degrees of freedom are a — 1, 


b — 1, (a — 1)(b — 1) and ab(r — 1). | a 
If x,.. is the mean of xij. taken over all blocks and replicates for the i 


treatment, and if x is the mean of all the observed x;;,, then 
(9.3.10) S, => (X;..— XY 

and this is algebraically equivalent to Eq. (6). Similarly, 
(9.3.11) S; =. — x)? 


where x... is the mean for the j^^ block, over all treatments and replicates ; 
ipi 


(9.3.12) 5, = У Gus = х)? 


222, INTRODUCTION TO STATISTICAL INFERENCE 9.3 


where X;;. is the mean over the К replicates for the і" treatment in the j block; 
and 


(9.3.13) S, = X .—X..— X, x)* 
ijk 


The mean squares may be calculated as before. On the null hypothesis that all 
the «„ В; апа y;; are zero, and on the assumption that the £;;, are normally 
distributed, these mean squares are all unbiased estimators of o?. Moreover, 
the mean squares for treatments, blocks and interactions are independent of 
the mean square for replicates, so that the ratios 


S, ab(r — 1) S, ab(r — 1) ай Ss ab(r — 1) 
S, a-1’ S, b-1 S, (a — 1Y(b — 1) 


all have the F distribution with the appropriate degrees of freedom. These 
ratios can therefore be used to test whether there are significant treatment 
effects, block effects, or interaction effects. 

Unless r is greater than 1 there is no possibility, with this design, of testing 
for interaction by the ordinary F-test. If r — 1, the interaction effect is generally 
ignored or treated as part of the error. If it is assumed, however, that rj; is of 
the form C a; 8, where C is constant, an F-test of the hypothesis г; = 0 is 
possible. See [7], p. 130. 


EXAMPLE2 Tests were carried out on sheets of building material for per- 
meability [4]. Specimens were selected from the output of each of three machines 
on each of nine days, and for each machine on each day three sheets were 
examined. The raw materials all came from a common store, but it was thought 
that the machines might vary in their quality of output and might also vary 
from day to day. The machines may be regarded as “treatments” and the days 
as “blocks,” and there were three replicates. The randomizing within blocks 
was done by varying the order of sampling from the machines on the different 


days. 


TABLE 9.4. 

Variation 5.5. D.F. M.S. 
Between machines 0.9168 2 0.4584 
Between days 0.5534 8 0.0692 
Interaction 0.8657 16 0.0541 
Between replicates 2.0150 54 0.0373 
Total 4.3509 80 


In this experiment the measured variate was the permeability (an average 
of eight measurements on each sheet) Since it appeared that the logarithm 
of the variate was more nearly normally distributed than the variate itself, 
the values in the above table all relate to the common log of the permeability. 


m 
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This is denoted by ху. From the experimental data, we find 


N-393-8 
У, худ = 127.093, С = 199.4152 
ijk 


Y xip? = 203.7661, 5, = 4.3509 
ijk 
Y (x x a) 254089627, 5, = 0.9168 


2 
X (x xu) = 1799.7177, 5, = 0.5534 


j 


ik 
2 

p» (x xu = 605.2533. S, — 2.0150 

ij VK 

and therefore Sa = 0.8657. The analysis of variance is given in Table 9.4. 

The F-ratio for interaction is 1.45, with 16 and 54 d.f. Since the 5% point is 1.83, 


the hypothesis of zero interaction is not rejected. 
The F-ratio for days is 1.85, with 8 and 54 d.f. This is also non-significant 


at the 5% level. 

The F-ratio for machines is 12.3, with 2 and 54 d.f. The 1% point is 5.02, 
so that there is a highly significant effect of the differences between machines. 

If there are no interactions, the conclusions about the main effects are much 
simplified. If there seems to be a real difference between two treatments, for 
example, we can conclude that this difference persists in all blocks. But if there 
is appreciable interaction, а significant difference between the two treatments 
merely means that, on the average over all blocks, there is a difference. In some 
particular block this difference might not exist or might even be reversed in sign. 

It may happen that the hypothesis of no interactions will be rejected by the 
ordinary statistical test while at the same time the hypothesis of zero main 
effects will be accepted. This means that there certainly are non-zero differences 
between blocks or treatments, but that when the block differences are averaged 
over the treatments, or the treatment differences over the blocks, the averages 


are not significantly different from zero. 


9.4 Estimation of Fixed Treatment Effects (Model I) There are two ways 
of looking at the treatment effects. They may be regarded as fixed effects or as 
random variables, and in different situations either the one way or the other 
may be more appropriate. In Example 1 of $ 9.1, the “treatments” were fixed 
percentages of coal in the sand used for making concrete, and any conclusions 
drawn from the experiment would presumably refer to these percentages and 
these only. However, it is conceivable that the specimens of sand used might 
have contained variable amounts of coal, drawn at random from some parent 
distribution, and in this case we could estimate the variance of the effect of 
added coal and apply the results of the experiment to percentages of coal outside 


the values actually observed. 
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In Example 2, above, the machine effects should probably be regarded as 
fixed, and the results applied to the particular machines used. The days, how- 
ever, might well be considered a random sample of days, unless there was a 
special reason for selecting particular days for the experiment. 

The mathematical model with fixed effects, which is the one we have been 
using, is sometimes called Model I. For the one-way classification of $9.1, 
we have 


(9.4.1) Xy — ht а, i-21,2...k, ja1,2...N, 
and the mean of the i" sample is 

(9.4.2) Zi =u +a; à. 

The over-all weighted mean of the x;., with weights N,, is 
(9.4.3) x-pu4é 


since У Nix; = 0. 
The expectation of x;. is, therefore, и + ж, and Из variance is c?|N,. The 
sums of squares between samples is given by 


(9.4.4) Sp = У N(X;.— x) 
= Мх. == а — (¥ ни) + a]? 
=F М. -pu-a;— (Xx — yup +3, Nia? 
+2 У, Мах. — u — o; — (X — u)] 
Now X;. — и — a; is normal with expectation 0 and variance c?|N,;, and its 


weighted mean is X — и. It therefore follows that У (N;/o?)[x;. — u —e;—(X — и]? 
has the y? distribution with k — 1 d.f., and hence 


(9.4.5) EY, мх. = n — a; (8 — Шр = (К — Do? 
Also E[X;. — u — a; — (X — u)] = 0, so that 
(9.4.6) Е($ь) = (k — e? + Y Na? 


This shows that if the о; are not all zero, the expectation of S,/(k — 1) is 
greater than 07, which justifies the use of the one-tailed F-test for treatment 
effects between samples. In the same way, 


(9.4.7) E(S) = (М – Do? + У Ма? 


and therefore, 
(9.4.8) E(S,) = (№ — k)o? 
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The fixed treatment effects do not enter into the mean square within samples. 
From Eq. (3), 


(9.4.9) ‘ Е(х) = и 
so that X is an unbiased estimator of и. Also, from Eq. (1), 
(9.4.10) E(Y х) = Ми + №; 

7 


so that а; = Е (X;.) — и. The quantity X;. — X is therefore an unbiased 
estimator of о. In Example 1, the estimates of a; are à, = 108, à, = 
—21, &; = —39, 8, = —30, 0; = —27. 

For the two-way classification of § 9.3, with fixed effects, a similar argument 
leads to the following results: 


E(S) = (а — 100? + brY a 
Е(5,) = (b — 1)o? + ar Y в 
(9.4.11) E(S,) = (a — 1)(b — De? + "Уһ? 
E(S,) —ab(r — 1)о? 
E(S) -(abr— 1)o? + bry а? + ar} By + rA) 


The estimators of о, Pj, y;; and и are 


8; =% X, = 1,2 a 

В, =х.,.-х, jf=1,2...b 
(9.4.12) e г 

Dig = Xi. — Xi Ху HX 

fi =F 


9.5 Estimation of Variable Treatment Effects (Components of Variance) 
Model II In Model II, the effects (even including the interaction) are treated 
as random variates which are normally and independently distributed. Thus in 
Eq. (9.2.1), æ; is regarded as a value of a random variate which has expectation 
zero aud variance с,2. We must suppose that the А samples actually examined 
are a random selection from a large population of possible samples. The 
members of this population may be denoted by the subscript и, and for each 
there is a “true” or expected value of X which we may denote by m,. This 
quantity m, is a random variable with a certain probability distribution over 
the population, and its expected value is и. The difference between т, and и 
is the true effect of sample и, which we have denoted by «,. The variance of о, 
is the quantity с,2. We have, therefore, for an actually selected sample i, 


(9.5.1) X; = m; + £j = H t + Е; 
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where £;; is the error term, namely, the difference between the true value т; and 
the measured value on the j'" replicate. It is assumed that the set of а; and the 
set of ғ;; are completely independent and that the г;; have the same variance а? 
for all i, and therefore, 


(9.5.2) V(xi) = о? + о? 


The quantities о,? and c? are the components of variance. 
It should be noted that two observations in the same sample will be cor- 
related. Thus the covariance of x;; and х;у, is given by 
C(xij, Xi) = E[G; — их, — и)] 
= E[(a; + 2 (а; + £;j)] 
= E(u?) = o 
since all the other terms have zero expectations by hypothesis. The quantity 
с,2](с,2 + c?) which is the correlation coefficient between two observations in 
the same sample, is called the intra-class correlation coefficient. 
The usual null hypothesis to be tested is that c,? = 0, which implies that 


m, = p for all values of u. If we suppose that all the samples are the same 
size (r), the sums of squares between samples and within samples are, as before, 


(9.5.3) S, =r > (%. — xy 
= (иж +8.) — (ина + 9] 

апа 
(9.5.4) Sy = У (ху х)? 

ij 

= У (ei; — ё j^ 

ij 
If the &; are normal with expectation 0 and variance а?, S,/o? has the x 
distribution with k(r — 1) d.f., so that 


Am 
(9.5.5) Eg aA? 


If also the а; are normal with variance o,”, and if we write n; for the variable 


a; + ё, then 
(9.5.6) S,-rYG - 8» 


and the и; are independently normal with expectation 0 and variance o, + cjr. 
It follows that S/[(o,? + o?/r)r] is a chi-square variate with k — 1 d.f., so that 


(9.5.7) E oy = го? +07 


The sums of squares S; and S, are statistically independent. The null hypothesis 
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is therefore tested by the ordinary F-test of the ratio of mean squares between 
and within samples. 

The power of this test is a function of о,2/02. И F has the ordinary F-dis- 
tribution with k — 1 and k(r — 1) d.f., and if F, is the value of F exceeded with 


probability «, then the power is given by 


c? 
(9.5.8) P- Pr(F = Fagg) 


For the two-way layout of § 9.3, a similar set of assumptions leads to the 
model 
(9.5.9) Xi = H + 05 + By + Vij + Eijk 


where the о, the fjj, the у;; and the ғ; jk аге и and | normally dis- 
tributed with zero expectations and variances с? ‚05° » Oy ? and c?, respectively. 
The variance of хз is then given by 


(9.5.10) V(xig) = Oa? + o5? +0, +07 
The sum of squares for A-effects is 
(9.5.11) 5,= br YG. E 
ыы cR c T+. 
If Jh =a, + Ji. + Ej, then n; is normal with expectation 0 and variance 
0,’ + o," [b + c?|(br). Also its mean over the a values of iis 7 = X + F + ё. 


2 
EN Li is y? with a — 1 d.f., and 
o + o,7/b + o°/(6r) 


(9.5.12) E[S,/(a — 1)] = (0.2 + 0,216 + 07/(br)) 
= bro, + ro,? +07. 


"Therefore, 


A similar argument leads to the results: 


(9.5.13) EB = aros? tro) + 0? 
Sab 2 2 
(9.5.14) Ea ер dud 
E 
(9.5.15) aren 


The four sums of squares ate statistically үрен and therefore ordinary 
F-tests of the hypotheses 0,2 = 0, c5? = 0, в,? = 0 may be carried out. 
It should be noted that, contrary to the conclusions from the fixed-effects 
model, the interaction component of variance appears in te mean squares for 
A-effects and B-effects. The F-test for the null hypothesis c,? = 0 must therefore 
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be carried out by comparing the mean squares for A-effects and interaction, and 
similarly for c5? = 0. If г > 1, we can test for interaction by an F-test of 
Eq. (14) against Eq. (15). i 

Similar but more complicated models can be used when there are three main 
effects, with three second-order interactions and one third-order interaction. 
With Model II a complication arises when the interactions are not negligible, 
because it turns out to be impossible to apply the 7-test directly in order to test 
for the main effects. An approximate method suggested by Satterthwaite may 
be tried in such саѕеѕ—[5], [6]. 


* 9.6 Mixed Models (Model Ш) A layout in which it seems reasonable to 
regard one effect as fixed and another as random is said to be mixed. In a 
problem concerned with the daily output of workers in a factory using certain 
machines [7], we might be inclined to regard the workers as a random sample 
from a large population, but we might be interested in the performance of 
individual machines, perhaps of different makes. 

Let ху be the output, say, for the j'" worker on the к" day that he is assigned 
to the i" machine, (i = 1,2...a,j = 1,2... b, К = 1,2... ғ). The days will 
be regarded merely as replicates, the effects in which we are interested being 
the fixed effects о; of machines and the random effects В; of workers as well 
as their possible interactions. We assume that 
(9.6.1) х= Miz + Eijk 
where ту is the “true” mean output of the /" worker on the i" machine, and 
the є; are independent normal variates with mean zero and variance c?. Since 
the j' worker is regarded аз a random selection from a large population of 
workers, we can think of т, as a particular value of a random variable Mi 
which represents the mean output of a worker selected at random on the 
machine numbered i. 

Let the expected value of М; over the population of workers be denoted 
by u; and let the arithmetic mean of the д; over the i machines be denoted by #- 
Then 
(9.6.2) m=EM), n-k-XÀ 
The main effect of the i machine is 


(9.6.3) 4 = H; Н, у. a; =0 


Suppose mj, is the value of М; for any worker labelled w in the whole 
population of workers. The true mean for this worker is the average of Miw 
over the i machines and the main effect В„ of worker w is the excess of this 


over the general mean. 
(9.6.4) By =m. — Е(М.) 


=i, — и 
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This is a random variable with expectation 0, and variance, say, оу". 

The main effect of worker w, specific to the i machine, may be defined as 
mj, — ЕМ) = ть, — Hi and the excess of this above its average over the 
machines is called the interaction of the i machine and the particular worker w, 
namely, 


(9.6.5) Viw = ma, — Hi т, +H 
For each i, this is a random variable with expectation zero and variance с,2. 


Also, Уу = 0 for each value of w. 
From Eqs. (3), (4) and (5) we obtain 


(9.6.6) mj, =H t 0 + By + уь 
and, for any worker w and any day d, 


(9.6.7) Xiwa = И 05 + Bw + Yiw + Eiwa 

where the в are independent of each other and of fj, and y;,, and have a 
common variance 02. The y;,, for different values of i, are not necessarily 
independent of each other or of fw, but have covariances depending on those 


of the random variables М. 
The b workers actually used in the experiment may be regarded as a random 


sample from the whole population of workers, and the r days similarly form a 
sample of all possible days. The х; of Eq. (1) has therefore the same form as 
the х,а of Eq. (7), but j takes only the values 1, 2... b, and К takes only the 


values 1, 2...r. That is, 
(9.6.8) Xij =H +a; + By +7; + Eijk 


The £; for different values of j may be looked on as independent variates all 
having the same distribution аз б, and the уг; are similarly independent with 
the same distribution as у, for any i. The г can be regarded as independent 


of each other and of the fj; and the у. " | 
For convenience of notation we may define с,” by the relation 


(9.6.9) (a — 1)0,2 = 2 a? 


but it must be remembered that we are treating the «; as fixed effects, so that 
в 2 is not the variance of а random variable. Also we will define с,2 by 


(9.6.10) (a- 002 =F су? 


The division of the total sum of squares into four parts may be carried out 
just as in Model I or Model II. We get 


(9.6.11) S, = Sa + 5, + Sav + 5, 
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where 
S, =һ у G..— xy 


S, =ar} (X.;.— x 
(9.6.12) J J 
5 = ry (хи. m X +x)? 

ij 


5, = у? (хк = х1)? 
ijk 


It is straightforward to show, as in $9.4, that if the гү, В; and уџ are 
normally distributed 


S, 
(9.6.13) E Е = = bro,’ ro? + а? 


Also, by using Eq. (8) in the second equation of (12) and noting that 7.; =0 
for all j, since Уу, =0, we find 


pe 5, = and, [(8; — В) + €. — Y 
= ar X (B; – В)? + ar DG). - gy? 
+ 2ar Y (B; — B.&.;.— 8 
The expectation of Y; (В; — B)? is (b — 1) ср2. Since &.;. is normal with 
variance c?/ar, the expectation -of Y; (é.;. — ë)? is similarly (b — 1)e^/ar. The 


expectation of the product term vanishes because of the independence of f; 
and ғ;„. Therefore, 


(9.6.15) E(S,) = ar(b — По? + (b — 1)e? 
so that 

S, 2 2 
(9.6.16) E pig 7, +6 


In the same way we can write 
(9.6.17) Sæ =r $ (у. + Ey. — Gi E j. +B) 
ij 


Now E Y; (y — 7i)? = (b — 1) 0,7 since the y; for given i are independent 
variates with variance о„”. It follows that ЕУ, (Yy — Y: y = (а— (6 – 09 · 
Let us define a variate /;; by 


fij = &j. — ё.. 


Then ij —5.;—&j. — 5. — Ёё; + ё. 
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Since &;;. has variance o?/r, the variance of л}; is (b — 1)o?/br (see $ 2.14). The 
expectation of У, (п — 5.) is therefore (a — 1)(b — 1)o/br, and that of 
У; (ni; = 7.;)? is b times as great. From Eq. (17), 


Sa = ry, [Qi m y + Gi; — ii. nx t2; — SC 3.2] 


The expectation of the product term is zero, because of the independence 
of у; and г, and therefore we have 


E(Sq) = Ка — 1)(Ь — 1o,? + (a — 1)(b — De? 


Dividing by (a — 1)(b — 1), we obtain the expectation of the mean square, 


Sap 2 2 
9.6.1 — Se —Í: 
( 8) Pn 1706210 ro, +o 
Finally, S, = ab Y, (в — Gi.) and E(S,) = ab(r — 1)c?, so that 
S 
9.6. —— 
(9.6.19) ta? 


The analysis of variance for the mixed Model III is set out in Table 9.5. 


TABLE 9.5 

Source of Variation D.F. M.S. E(M.S.) 
A-effects (fixed) а—1 5з/(а — 1) broa? + ray? + o? 
B-effects (random) b-1 Solb — 1) arag? + о? 
Ах B-effects 

(interaction) (a — 176 — 1) | Sav/I(a — 1)(b — 1] | гоу? + о? 
Error ab(r —1) | Sr/[ab(r — 1)] о? 
Total abr — 1 


The four sums of squares are pairwise independent, except for the pair S, 
and S,,. We can therefore test for interaction by comparing the mean squares 
for interaction and error, test for B-effects by comparing the mean squares for 
B-effects and error, and test for A-effects by comparing the mean squares for 
A-effects and interaction. 

Estimates of 0,2, 07, 0,? and в? can be calculated from the last column of 
Table 9.5. The estimator of и is the over-all mean x, and estimators of о, В, 
and у; are, respectively, 


(9.6.20) @ = X. — X 
(9.6.21) Bj2x,—X 
(9.6.22) ди = Iy- Ep Xj. X 
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* 9.7 Nested or (Hierarchal Models) One type of incomplete design is that 
in which a factor B is "nested" within another factor А. This means that for 
each level of A there is a set of levels of B, and each B-level occurs in just one 
A-level. In the type of design considered earlier in this chapter, each B-level 
occurs in each A-level, and A and В are said to be completely crossed. If this is 
not so but if some B-level occurs in at least two A-levels, the factors are said 
to be partly crossed. 

As an example, we may consider an experiment on the determination of 
the protein content of wheat. From many suitable laboratories in Canada, 
three were selected at random and, in each of these, five samples of wheat were 
analyzed on each of two days. All the 30 wheat samples were parts of one 
carefully selected master sample and were thoroughly mixed and randomized. 
There is presumably a main effect between the different laboratories, and also 
between days in each laboratory, but day number | in one laboratory has 
nothing whatever necessarily in common with day number | in another labora- 
tory. In fact, we might not even use the same number of days in the different 
laboratories. The day-effect is said to be “nested” within the laboratory effect, 
or, to put it another way, the day-effect has a lower rank in the hierarchy of 
effects than the laboratory effect. This is why the nested model is sometimes 
called “hierarchal.” There is no need for any interaction term in this model, 
since no B-level occurs with more than one A-level. 

We will suppose that the A-effect occurs at a levels, denoted by i (i = 1, 
2...а), and that, nested within the /" level, there are b; B-levels, denoted by j 
(j21,2...bj). In the illustration above, the b; were all equal to 2, but this 
is not necessary. We suppose also that corresponding to any given i and / 
there are ғ;; replicates of the measurement of a variate X. The r;; can be 
different for each pair of i and j, but it will be convenient to assume that they 
are all equal to r. The measurement on the А" replicate will then be denoted 
Бу хапа is given by 
(9.7.1) Xijk = mij + Eijk 
where the £;;, can be regarded as "errors," which are independently and nor- 
mally distributed with expectation 0 and variance c?, and are independent of 
the т. The m;j are the expectations (or true values) for the і" A-level and 
the Л" B-level. 

We may use either Model I, II or III for the effects. If we suppose, in the 
illustration mentioned earlier, that the laboratories are chosen at random from 
a large number, and if the days are also (as far as possible) random, we should 
use Model II, and this model will be assumed in the rest of this section. There 
is then a large population of A-levels, denoted in general by и, and for the и 
level there is a large population of B-levels nested within it and denoted in 


th 
The expectation of X, measured for the р sub-level of the и 


eral by v. 
ie У of Eq. (1) is the value of Mw 


level, will be the random variable т. The mi; 
for и = i and v = j. 
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` If the conditional expectation of m,, for fixed и is denoted by m,., and the 
expectation of m, is denoted by m.., we may write 


(9.7.2) My = p +9, + В, 
where 

и =m.. 
(9.7.3) % =M, — mM.. 


Buy = m, — ть. 


As usual, и is the over-all expectation of X. The random variable а, has 
expectation 0 and variance c,?. The random variable fj,, has a conditional 


expectation (for given и) zero and conditional variance c5,?. These variables 


are uncorrelated, as may be shown with the help of a theorem on conditional 
expectations in Appendix A.14. The proof goes as follows: 


E(z В.и) = „Е(В„]и) 
=0 


and therefore, by Eq. (А. 14.4), E(x,fj,,) = 0, which shows that the correlation 
is zero, both variables having zero expectations. 
For the variance of f, by Eq. (A.14.6), we have 


V(Bw) = V[EBult)] + ELV (Buolu)] 
The first term on the right-hand side is zero; hence, 
(9.7.4) ИВ) = EO pu?) = с? 


The A-levels actually selected in the experiment may be denoted by и; 
(i= 1, 2...a), and the B-levels corresponding to и; Бу v; (j = 1, 2... bj). 
Then 


(9.7.5) my —p t a; + Bi; 


where о; is the value of а, for и = и; and fj;; the value of fu for и = и; and 
v = оу. These are uncorrelated and are both independent of в. We now 


suppose that they are also normally distributed. 
The total sum of squares 5, is divided into three parts: 


(9.7.6) S, = 5, + 5 + 5, 
where 

s =r}, b(X,..— XY 
(9.7.7) S,-r Y У E 5.2 


S, = bos (хк — Xi Si 
ijk 
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It should be noted that the mean over i is a weighted mean when the b; are not 
all equal. Thus, 


where N = r } b; 

These three sums of squares are statistically independent. Details of the 
proof of this statement may be found in Scheffé's book [7]. 

The quantity S,/c?, under the normality assumption, is a y?-variate with 
(r — 1) У b; degrees of freedom. Also 5,/(02 + ros?) is а y?-variate with 
Ун (b; — 1) df. However, S, is not in general equal to a constant times a chi- 
square variate. If it happens that c,? = 0 (which means that there are no 
A-effects), it may be shown that 5,/(02 + оз?) is distributed like y? with a — 1 
d.f. Also, if all the b; are equal to b (as in the example of the laboratory analyses 
described above), S,/(o? + rag? + bro;?) is distributed like z? with a — 1 d.f. 
In the more general case, the expectation of S, is given by 


(9.7.8) E(S,) = (a — 1)(0? + гор?) + (a — 1)Ас,2 
where 


bj? 
9.7.9 ee га EM) 
e ЕЕ 
which, when 5; = b, reduces to br(a — 1). 
The analysis of variance table is, therefore, as given in Table 9.6. 


TABLE 9.6 
Variation S.S. D.F. M.S. E(M.S.) 
A-effects Sa |a—1 Sal(a — 1) д? + rop? + Ao? 
B-effects (nested in A) So | Sibi —a |50 bi —a) |o? + rog? 
Error Sr | (r рУ: | Syr — 1) У bi] о? 
Total S |rYX«5—1 


If b; = b for all i, A = br, у b; = ab. 

The hypothesis of no B-effects (ap? = 0) may be tested by an exact F-test 
of the mean square for B-effects divided by the mean square for error. A test 
of the hypothesis c,? = 0 may be made by dividing the mean square for A-effects 
by the mean square for B-effects. The power of these tests can be calculated 
as іп $8.16, except for the test of с,2 = 0 when the b; are unequal. In this 
case an approximate method must be used. 


9.8 Latin Square Designs In a complete experimental design there is a 
least one observation for every possible combination of levels. The neste 
design is incomplete because each B-level appears in combination with one 
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and only one A-level (that level within which it is nested). The Latin square is 
another type of incomplete design in which there are three factors, all with the 
same number of levels m, but in which observations are made on only m? 
instead of the possible т? combinations. This is obviously economical of effort, 
but the design is rather restrictive (enforcing the same number of levels for each 
factor) and the usual analysis does not allow for interactions. 

A Latin square is a square array of symbols, m to a row or column, and 
such that each symbol appears once and only once in each row and each column. 
Thus in the figure below there are 5 letters arranged in this way. In an agri- 
cultural experiment these might represent 5 varieties of wheat planted in 25 plots 
arranged in 5 rows and 5 columns across a field. The purpose of the arrangement 
is to allow for a possible row-effect or column-effect on the yield, such as might 
be produced if there is a definite fertility gradient across the field in the direction 
of the columns or the rows. In an animal-feeding experiment there might be 5 
types of ration used on 25 animals. If the animals belonged to 5 different litters 
(5 to each litter) and could be kept in 5 types of pens, 5 in each type, we could 
arrange a Latin square experiment in which each type of pen contained just 1 
animal from each litter. The 5 rations would be allocated to the animals 
according to a Latin square, and we could then eliminate the effect of litters 
and of pens on the gains in weight. 

A Latin square remains a Latin square when the rows, or the columns, are 
permuted among themselves. A standard m by m square is one in which the 
first row and the first column contain the m symbols in their natural order, as 


in the right-hand 5 by 5 square below. 
E 


AG B В As С D E 
B CEDA B E 4 б D 
A BD E C C D B E A 
C D B A E D C E A B 
D Е A C B E A DB С 


With 5 symbols there are 56 different standard squares, and from each of these 
we may obtain 2880 Latin squares by permuting rows and columns. The 
number of possible Latin squares increases enormously as m increases. In an 
experimental design we should choose a Latin square at random from the 
whole set of possibilities. We could, for instance, first choose a standard square 
from a complete set of standard squares, such as is found in Fisher and Yates' 
tables [8] for the smaller values of m, and then permute at random the columns 
and the rows (except the first row). For this purpose a table of random permu- 
tations of the numbers 1 to 9 may be used—see, for example, [9]. 
The mathematical model for a Latin square design is 


(9.8.1) жд = и +0%+ Ву + ук + Eijk 


where i, j, k take values from 1 to т, but where only т? sets of triples (i, j, К) 
are permissible, these being dictated by the particular Latin square used. The 
а; are the main effects for the first factor (say treatments), the fj; are those for 
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the second factor (say rows) and the у, those for the third factor (say columns). 
The =; are assumed to be independently normal with expectation 0 and variance 
c?. The mean values for «;, В; and y, are adjusted fo zero as usual. This model 
ignores any interaction between the three main effects. The null hypotheses to 
be tested are (a) all «; = 0, (b) all f; = 0, (c) all y, = 0. 

We will assume that the effects are fixed, as in Model I. The total sum of 
squares S, may be divided into four parts: 


(9.8.2) S,=S,+5,+5,+5, 
where 
(9.8.3) 5, = X (хк — xy 
(i,j,eD 
= 2; хід? -G 
D 


D being the actual set of triples (i, j, К) appearing in the square. 


5. = ту (5... 5) =тух..2– б 

(9.8.4) S,2my(x,.—x)-2mYx;,^-G 
7 

S,2-mYy(x.,—-xy-myx.-G 
k k 


Here X is the arithmetic mean of the т? observations and is an unbiased esti- 
mator for и. С is the quantity m?x?. Also X;.. is the average of the m obser- 
vations on the і" treatment, X.;. is the average of the m observations in the j^^ 
row, and X.. is the average of the m observations in the К column. The sum 
of squares for error S, is calculated by difference from Eqs. (2), (3) and (4). 

The number of degrees of freedom for treatments, rows and columns ism—1 
in each case. Since the total number is m? — 1, there are т? — 3m + 2 degrees 
of freedom for error. The expectations of the mean squares are given in 
Table 9.7, where о,2 is a symbol for У; иР/(т — 1) and similarly for ар 
and o;?. 

TABLE 9.7 
Analysis of Variance for m x m Latin Square 


Variation S.S. D.F. M.S. E(M.S.) 
Treatments (A) Sa m-—1 ба/(т — 1) о? + тоа? 
Rows (В) $ь m-—1 5ь/(т — 1) а? + тор? 
Columns (С) Se m—1 5е/(т — 1) o? + тоу? 
Error Se m? — 3m + 2 | Se(m? — 3m + 2) а? 


Total St т? — 1 
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The four first sums of squares in the above table are statistically independent, 
so that F-tests of the three null hypotheses mentioned above may be applied 
to test for significant effects of treatments, rows or columns. 

If there are interactions between these effects the interpretation of the results 
of the experiment is complicated and difficult. Usually the expectation of the 
error mean square will be increased, but the effect of interaction on the other 
mean squares may be to increase or reduce them. 


* 9.9 Balanced Incomplete Block Designs In the complete randomized block 
design of 89.3, every treatment appears in every block, possibly with several 
replications as well. But it is sometimes advisable to have smaller blocks, the 
size of which is dictated by special circumstances. If one were carrying out 
tests on the life of automobile tires of various makes, the four wheels of a car 
would form a natural “block.” 

We shall consider a fixed-effects model, even though it might well be more 
reasonable in some cases to think of the blocks as randomly chosen from a 
large population of blocks (this would be true for the car tires, for example, 
in the illustration above). We suppose that the blocks are all of the same size, 
that each treatment occurs r times, and that no treatment appears twice in the 
same block. Then if there are k “plots” (or items) in a block, b blocks, and 
a treatments, we obviously have 
(9.9.1) N — ar = bk 
where А < a. 

A design is said to be balanced if each pair of treatments occurs in the same 
number of blocks. If this number is denoted by 4, it follows that in a balanced 
design 
(9.9.2) (a — 1) = (к — Dr 

In the example illustrated in Table 9.8 there are seven treatments (denoted by 
letters А to С), and seven blocks, each of size 4, so that å = 2. The pair BD, 
for example, occurs in blocks 5 and 7 only, and similarly for every other pair. 


TABLE 9.8 


Block Number 
1 2 3 4 5 6 7 


"5806 
Dawa 
bah 
Оо № 
особы 
» 50h 
ttt 


The analysis is considerably simplified when the design is balanced. A further 
necessary condition for such a design is 


(9.9.3) a<b 
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which implies г > К. Many designs satisfying conditions (1), (2) and (3) may 
be found in reference [9], Chapter 11. Once a suitable design has been selected, 
the numbering of the treatments and of the blocks, and the positions within 
the blocks, may all be randomized. 

If we want to balance out the positions by ensuring that each treatment 
occurs just m times in each of the k positions in a block, we must add a fourth 
condition 


(9.9.4) b = ma 


orr = km. This is satisfied in Table 9.8, with т = 1. Treatment A, for instance, 
occurs just once in each of the four positions in a block. 
The mathematical model is 


(9.9.5) Xy =H +a; +В, +8; 


where the о; are treatment effects, and the В; are block effects and, as usual, 
Уа = У f; = 0. It is assumed that there is no interaction. The ги; are inde- 
pendently normal with expectation zero and variance c?, but it must be observed 
that only some of the possible pairs i, j correspond to actual observations. If 
Ку; is the number of times the i'^ treatment occurs in the j'^ block, K;; is either 
О or 1, and the observed values of хуу correspond to those pairs for which 
Ki; = 1 (there are ar of these, out of ab pairs altogether). 
If we define the і" treatment total g; and the /" block total А, by 


(9.9.6) а=Ухь hj Yu 
J f 
the i'* adjusted treatment total is 
(9.9.7) G, -g, -k` Y, К, 
E] 
=9,— Тик 


where T; is the sum of the block totals in which the i" treatment appears. The 
adjustment therefore consists in subtracting the sum of the block averages for 
those blocks in which the i'^ treatment occurs. 

The total sums of squares S, is given by 


(9.9.8) 5,=У 2-6 
р 


where D refers to the set of pairs (i, j) for which Кү, = 1 and С is the correction 
for the grand mean, 
(28) 


N 


(9.9.9) G= 


This total sum of squares is split into three parts: 


(9.9.10) 5, = Step. + был. + S. 
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where Se», is the sum of squares for treatments, eliminating blocks, and Sy, 
is the sum of squares for blocks, ignoring treatments. These are given respectively 
by М 


GF 
(9.9.11) Sica, = 3 
h2 
(9.9.12) Sin =D — 
Tk 
The number Е, called the efficiency factor of the design, is defined by 
a(k — 1) 
(9.9.13) "s -D 


and is less than 1 because k < a. The sum of squares for error, S,, is obtained 
by subtraction. The number of degrees of freedom for error is N — 1 
—(a — 1) — (b — 1)= N — a — b + 1. The analysis of variance is outlined 
in the Table 9.9. 


TABLE 9.9 
Variation S.S. D.F. E(M.S.) 
Treatments (eliminating blocks) | У: Ge?/(rE) a— 1 | 0° + rogE 
Blocks (ignoring treatments) УВК — G b-1 
Error (by subtraction) | N—a—6+1 | о? 
Total Урху? — G N-1 


In the last column of Table 9.9, о,2 is defined as usual (for fixed effects) by 


аг 
(9.9.14) с? = У, 


га-—1 


Treatment effects may be tested for significance by the ordinary F-test, and 
estimated by 


(9.9.15) a = ТЕ 


If it is desired to test the hypothesis that all the block effects are zero, the 
numerator of the F-statistic is the mean square for “blocks eliminating treat- 


ments,” given by Sy... (b — 1), where 
hj? G? 9: 
(9.9.16) был, = Lg + у EL. 


However, the block effects are usually regarded as less important than the 
treatment effects. 
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For details of the many other types of experimental design used in practice 
see reference [9], and for a full treatment of the theoretical considerations 
involved, reference [7] is invaluable. L 


9.10 Departures from the Assumptions Underlying Analysis of Variance 
Techniques The assumptions usually made fall under four heads: (a) additivity 
of effects, which implies an absence of interaction, (b) independence of the 
error terms, (c) normality of the distribution of error terms, (d) constancy of 
the variance of error terms, whatever the magnitude of the main effects. Taken 
together, these constitute a severe restriction on the type of data to which the 
techniques of analysis of variance are strictly relevant. It is highly desirable to 
know whether these assumptions can be relaxed appreciably in practice. 

As regards non-normality, some empirical data on sampling from artificial 
non-normal populations suggest that the ordinary F-test is fairly robust and 
can be used without serious error even for considerable variations from nor- 
mality. Care should be used in claiming significance when the probability is 
near the border-line, since, on the whole, non-normality tends to make results 
look more significant than they are. 

Non-normality does not introduce any bias into point estimates of para- 
meters (or linear combinations of parameters) or of the components of variance. 
It does, however, affect the validity of the F-tests, since without the assumption 
of normality it is not in general true that the mean squares have independent 
chi-square distributions. 

Inferences about the mean и which are valid for a normal population will 
also hold for almost any non-normal population as long as the sample is very 
large. This, however, is not true for the population variance c?. If the popu- 
lation has a kurtosis у», the variance of 52/02, where s? is the sample variance, 
is increased for large N by a factor 1 4- 372, and this may seriously affect any 
significance levels or calculations of power obtained from normal theory. Most 
inferences about variances, including F-tests, are subject to uncertainty due to 
non-normality, but these effects are less when the design is such that equal 
numbers of observations occur in each cell of the layout. 

As regards independence, the simplest reasonable alternative is probably to 
assume that the observations are serially correlated. That is, if the observations, 
taken successively, are denoted by x,, x, ..., there is a constant correlation р 
between x; and x;,,, but all other correlation coefficients are zero. It may be 
shown that under these conditions 

E(x) = и 
2 


(9.10.1) и) => fı +20(1 - 5) 


E(s?) = 02(1 — 2p/N) 
with p taking possible values from —3 to 4. The probability that a confidence 
interval with nominal confidence coefficient 1 — и does not cover the true 
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value и is, for large N, not « but 2[1 — Ф(4)] where А = 2201 + 20) 7, 2,3 
being the standard normal variate exceeded with probability 0/2. For p = —+ 
and æ = 0.05, the probability 2(1 — Ф(А)) is zero and for p = 1 it is 0.166, so 
that there is a chance of rather serious error in ignoring p. 

The effect of inequality of variances in the one-way layout with fixed effects 
(§ 9.2) depends upon the sample sizes. If these are all equal, the effect is slight, 
but in general the variance of the ratio of mean squares (between and within 
samples) is increased, even for large values of N;, by this inequality. The result 
is to increase the true probabilities for Type I errors, in the usual F-test for 
equality of means, beyond the nominal value с. The effect may be quite serious, 
leading even to a doubling or trebling of the error. 

The most common procedure to reduce inequality of variance is to make 
use of transformations such as those mentioned in §§ 3.15, 3.16 and 4.8. The 
logarithmic transformation (using the logs of the observations instead of the 
observations themselves) is appropriate when their percentage error is approxi- 
mately constant. Transformations to reduce inequality of variance often reduce 
non-normality also, but they may destroy the additivity of effects which existed 


before the transformation. 


9.11 Estimation of Contrasts The ordinary F-test in an analysis of variance 
determines whether there is a significant difference between a group of means, 
but the investigator really wants to know more than this—which of the means 
differ significantly from which others. One should not, of course, pick out the 
two means which differ most (out of k samples, say) and apply the ordinary r-test 
to these, since the two selected in this way are clearly not chosen at random. 
There are k(k — 1)/2 pairs that might be chosen, of which the selected pair is one. 

Any linear combination of treatment means (or other parameters) with 
coefficients adding up to zero is called a contrast. A difference of two means 
Such as «ү — о, is а contrast and so is an interaction: yy7m;-m;.—m.jm... 

Fisher has pointed out [10] that when one wishes to test a particular contrast 
(picked out after the results of the experiment are known), the F-test having 
failed to demonstrate a significant differentiation, one should be very cautious 
in claiming significance. He suggests that if the contrast is one out of k(k — 1)/2 
Possibilities, we should require significance at the level «/[k(k — 1)/2] instead of a. 

As an illustration we may consider the data of Example 1, § 9.1. The F-test 
Bives a non-significant difference between means, but let us nevertheless make a 
t-test for the two samples 1 and 3, for which the estimated difference is 147. 
As an independent estimate of the population variance, а?, we may use the 
Combined mean square within samples, which is 7619, with 14 d.f. The t-value 
is 147 [7619(+ + -#)]` + = 2.20, and this is significant at the 5% level. The 
Probability of a value numerically as great is about 0.045. If, however, we 
Consider this difference as simply one out of 10, we should require a t-value 
0f 3.33 (corresponding to a probability 0.005) before claiming significance. This 


Would mean a difference of means of at least 222. 
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In general terms, for k samples each of size N, Fisher's test requires us to 
calculate a significant difference D given by 
(9.11.1) D = t,,QMAN)'? ' 
where M, is the mean square for error, with v degrees of freedom, 
e = a/[k(k — 1)/2], and t,„ is the upper =/2 point of the :-distribution for v 
degrees of freedom. Then D is the difference which should be regarded as 
significant at level ж. The 100(1 — «)% confidence limits for the difference 
between two population means are X, — X; + D, where X, and X; are the 
observed sample means. If only differences as great as D are reported as sig- 
nificant, the expected number of wrong statements per experiment will be a. 
Thus Юга = 0.05 we should make only one wrong statement in 20 experiments 
of the same type. 

If the F-test has shown significance and one desires to know which means 
differ significantly from which others, the Newman-Keuls procedure is perhaps 
the best. Let a be the number of levels of the factor under consideration (say 
treatments), and let M, be the mean square used as the denominator of F in 
testing for this factor and v the number of degrees of freedom corresponding to 
М.. Also let г be the number of observations at each level of the factor. The 
observed means are arranged in order from the smallest to the greatest, and the 
test is carried out on sub-groups of p successive means beginning with the 
whole set (for which p — a). For any such group the significance is tested by 
comparing the observed range with the critical range, the latter being obtained 
from the distribution of the studentized range. 

The studentized range of p observations, having an actual range К and 
coming from a population with variance о?, is R/s, where s,? is an independent 
estimate of с? based on v degrees of freedom. (Note that the standardized range 
is К/с.) In the analysis of variance the estimate is usually based on the mean 
square for error. " 

The probability integral (distribution function) of the studentized range 15 
given by 
(9.11.2) F(q) = P(R/s, < а) 

y2 
where G(w) is the probability integral for the standardized range, § 8.21. Tables 
of percentage points of the studentized range may be found in [11], for the 
appropriate values of p and v. If q is the upper 5% point, the critical range 15 
found by multiplying а by s,. In the Newman-Keuls procedure the observations 
are each a mean ‘of r original observations, and the estimated 5,2 is therefore 
Myr. 

EXAMPLE 3 Table 9.10 gives measurements of tensile strength (kg[cm^) 
on specimens of rubber. There were five batches, each batch affording SIX 
specimens. The mean, range and variance for each batch are given. 


| x le^ I" (ах) dx 
о 


9.11 ANALYSIS ОЕ VARIANCE 243 


TABLE 9.10 
* Batch Number 
1 2 3 4 5 

177 116 170 181 177 

172 179 156 190 186 

137 182 188 210 199 

196 143 212 173 202 

145 156 164 172 204 

168 174 184 187 198 
Mean 165.8 158.3 179.0 185.5 194.3 
Range 59 66 56 38 27 
Variance 468.6 653.1 406.0 196.3 111.5 


The within-batch estimate of variance is M, = (1835.5)/5 = 367.1, with 
25 d.f., and r = 6, so that s, = 7.82. The value of q for v = 25 and p = 5 is 
4.16, giving a critical range of 32.5. 

Arranging the actual means in order, we get 


158.3, 165.8, 179.0, 185.5, 194.3 


For the whole set, R = 36.0, which exceeds the critical range, and we may 
therefore conclude that there is a significant difference at the 5% level for the 
whole group of means. This test can therefore be regarded as a substitute for 
the analysis of variance test. In the present example the mean square between 
batches 15 1273.5 with 4 d.f., and F = 3.47. Since the 5% point of F, with 4 
and 25 d.f., is 2.76 and the 1% point is 4.18, the observed value is significant 
at the 5% level, which agrees with the result of the Newman-Keuls test. 

We next proceed to omit the largest (or smallest) mean and find the critical 
range for both the remaining sets of 4 means. If a significant difference is found 
We test groups of 3 means, and so on. Any time that a group of means is found 
Not to differ significantly it is underlined. Any two means not underlined by 


the same line differ significantly. 
The q values for p = 2, 3, 4, wit 
giving the critical ranges shown: 


h v = 25, are as follows (upper 5% points), 


c 22.8 21.5 30.4 


he range of the means is 27.2, and omitting 158.3, it is 28.5, 


Omitting 194.3, t 
gnificant. It is not therefore necessary to go any further. 


neither of which is si 
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The set of means, underlined, is as shown below: 


158.3 165.8 179.0 185.5 194.3 


Only the contrast of the first and last is significant at the 5% level. | 
Applying the same method to the data of Example | (and assuming for 
convenience that the samples are all of the same size 4), we find that the critical 
range for p — 5 is 192 while the actual range is 147. There is therefore no 
significant difference between the means, as was found previously by the F-test, 
and the contrast of even the least and greatest is non-significant at the 5% level. 


* 9.12 The Power of Analysis of Variance Tests In calculating the power of 
the F-test, it is convenient to make the change of variable 


nF 
(9.12.1) x "RP ER 


where л, and n, are the degrees of freedom for F. Under the null hypothesis Ho 
x is an ordinary beta-variate with parameters in, Зп. The distribution under 
the alternative hypothesis Н 1 Was worked out by Tang [12]. | 

If we consider the one-way layout, with k samples each of size r, and if 
5, S, are the sums of Squares between treatments and within samples respec- 
tively, we have 


S, 
9.12.2 = =k — = == 
( ) х 5+5, n, —k-1, nj =k(r — 1) 


With the usual notation for fixed treatment effects о, (with У o; = 0), the 
null hypothesis is that ¢,? = 0, where 
2 
9.12.3 tay 
( ) [75 у> Ec 


The alternative hypothesis H 1 
The population variance is assu 
Under Hp, S,/c? has the x 
non-central y? distribution (8 A. 


is that not all the о, are 0, so that c? is not 0. 
med constant and equal to g2. 

distribution with n, d.f. Under Н, it has the 
13) with non-centrality parameter 


(9.12.4) À — r(k — 1) 


с,? 
20? 


On the other hand, S/o? still has the ordinary X^ distribution with n; d.f. 
The density function for x under Н. 1 is 


e Ayim-1 а. xj"-!Hüy) 
9.12.5 a2 Ы а 7 
hi | = В(%п,, 3п,) 

where 

(9.12.6) Hija, m2 x 


ni 11 ап, €2)21 + 
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with n = n, + nz. This function is called the confluent hypergeometric function. 
When 4 = 0, H(0) = 1 and x is a beta-variate. 

The null hypothesis is rejected when x > X4 X, being determined for a 
given Type I error « by the relation 


(9.12.7) а= f Јој = 0) dx 


Ха 


=1- Ans їп) 


where / is the incomplete beta-function ratio tabulated by К. Pearson ($ 4.5). 
The power of the test is given by 


1 
(9.12.8) P=1 -s-| F(x) dx 


where f(x) is the function defined in Eq. (5). Tang's tables give values of x, 
corresponding to « = 0.05 and 0.01 and also the power P for selected values 
of 4. (Tang denotes our x by £^, and our 2 by k$?[2.) 


EXAMPLE 4 Suppose we have four treatments, each with five replicates. 
Then n, = 3, n; = 16. (If the experiment were done in randomized complete 
blocks, we should have to subtract the four degrees of freedom between blocks 
and so obtain n, — 12.) Suppose the true treatment effects œ; expressed as 
percentages of й, are — 5, —4, 3, 6, and let an estimate of о be 10% of д (obtained 
perhaps from previous experience). Then с,2 = 86/3 = 28.7, о? = 100, 
Д = 2.15, ф = 1.04, m = 3, n; = 16. The tables show that Гог « = 0.05, the 
power P when ф = 1.04 is about 0.31, so that the chance of detecting differences 
as great as those just mentioned is only about 3 out of 10. 

The same method will apply to a Latin square experiment. If the square is 
of side т, the degrees of freedom are n, = ? — 1, 25 = (т — 1)(m - 2). 

If we use Model II in which the treatment effects are random with variance 
6.2, the power of the test is а function of 1 + 9.7/0? (see 89.5). 

Recent investigations on the effect of non-normality suggest that moderate 
amounts of skewness and kurtosis have comparatively little effect on the power, 
kurtosis being more important than skewness in this respect. 
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PROBLEMS 
> 9.1-9.2 | | 
" jm ет ЕМҸ each of four seasoned mine-props, were tested for maximum 


load. The means and standard deviations of the maximum load (in units of 1000 Ib wt) 
were as follows: 


Sample No. Xi Si = (Sin)? 


1 42.0 10.10 
2 52.0 12.06 
3 65.5 5.45 
4 51.8 9.54 
5 73.5 19.26 


Test approximately the homo, 


geneity of the variances, using Bartlett’s method. 
2. Prove that when k = 2. 


, the sum of squares between samples reduces to 


№№ | _ s SESS 
dni Ex 0 


3. Show that the t-test for the significance of a difference between two sample 
means is a special case (when К = 2) of the F-test 


So reduces to the expression given in Problem 2,a 
F= (М — 2)S»/Sw and this is th 
4. The following table repre: 


sents the yield of wheat, in bushels per acre, for trial 
plots of land treated with four different levels of fertilizer. Each level was applied to 
five plots randomly chosen over a field. 
т=н M o 
Plot Treatments 
— et 
Number 1 2 3 5 
1 21 24 34 40 
2 25 33 26 47 
3 31 34 38 39 
4 17 39 32 41 
5 26 35 35 33 


Determine whether there is a significant treatment effect, 
5. Seed yields in hund 


reds of pounds per acre from three replicates of four varieties 
of alfalfa were as follows: 


a NN 


Varieties 
Plot = 
1 2 3 4 
1 4.7 5.1 9.1 4.9 
2 5.2 4.8 6.3 5.2 
3 24 3:2 5.8 5.3 


Does there appear to be a Significant difference between varieties? 
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B. ($$ 9.3-95) 

1. The data represent sugar yields (tons/acre) for nine varieties of sugar beet. 
The design consists of five blocks each of nine plots, with no replications. The varieties 
were randomized among the plots in each block. Test for significant differences between 
varieties, and between blocks. Hint: The interaction, if any, is included with the error. 


Variety 
Block 
A B € D E F G H J 
1 1.94 1.70 2.23 2.14 1.80 1.82 1.91 1.90 1.98 
2 2.08 1.96 2.26 2.08 2.23 2.06 2.06 2.25 2.03 
3 1.86 1.83 222 | 2.16 1.67 2.03 2.22 1.92 1.81 
4 2.21 1.60 | 2.08 2.16 | 2.11 1.96 | 2.14 1.99 1.77 
5 2.03 2.13 2.02 | 2.17 2.01 2.28 2.28 2.02 1.88 


2. In а greenhouse experiment on wheat, four fertilizer treatments of the soil 
and four chemical treatments of the seed were used (including in each case a control 
with no treatment). Each combination was applied to three plots which were placed 
at random in the available space. Show that there is negligible interaction between 
chemical treatments and fertilizers, but a large effect due to fertilizers. 


Chemical Treatment 


Fertilizer 
1 2 3 4 
1 21.4, 21.2, 20.1 | 20.9, 20.3, 19.8 | 19.6, 18.8, 16.6 | 17.6, 16.6, 17.5 
2 12.0, 14.2, 12.1 | 13.6, 13.3, 11.6 | 13.0, 13.7, 12.0 | 13.3, 14.0, 13.9 
3 13.5, 11.9, 13.4 | 14.0, 15.6, 13.8 | 12.7, 12.9, 13.1 | 12.4, 13.7, 13.0 
4 12.8, 13.8, 13.7 | 14.1, 13.2, 15.3 | 14.2, 13.6, 13.3 | 12.0, 14.6, 14.0 


3. In an experiment to determine whether five makes of automobile average the 
same number of miles per gallon, three cars of each make were selected at random in 
each of three cities and given a test run on one gallon of a standard gasoline. The 
table gives the number of miles travelled. Make an analysis of variance and determine 
whether there is a significant effect (a) of makes, (b) of cities. 


Make Los Angeles San Francisco Portland 

| “= ====— == 
20.3, 19.8, 21.4 21.6, 22.4, 21.3 19.8, 18.6, 21.0 
19.5, 18.6, 18.9 20.1, 19.9, 20.5 19.6, 18.3, 19.8 
22.1, 23.0, 22.4 20.1, 21.0, 19.8 22.3, 22.0, 21.6 
17.6, 18.3, 18.2 19.5, 19.2, 20.3 19.4, 18.5, 19.1 
23.6, 24.5, 25.1 17.6, 18.3, 18.1 22.1, 24.3, 23.8 


тосо 


4. Estimate the separate treatment effects in Problem A-4 and the separate variety 


effects in Problem A-5, using Model I. Е 
5. Estimate the variety effects and block effects in Problem B-1, using Model I. 


7. Prove the following identities: 
(а) Xu Gi. — би. — Ži. — Xa. 9$ =0 
(b) Xu Ga. — Du. — Xe. — хз. + X)-0 
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8. In Problem A-4, assume that the treatments are random selections froma normal 
population (e.g., samples of fertilizer might differ in the proportions of an active 
ingredient, and these samples might be chosen at random and applied in equal amounts 
in each treatment). Calculate the components of variance, the intra-class correlation, 
and the power of the F-test corresponding to a size of 0.05. 

9. In an experiment on yield of sugar beets (tons/acre) there were two levels of 
irrigation treatment and three of fertilizer treatment, and each combination of treat- 
ments was carried out in five replications. The analysis of variance table was as follows: 


Variation S.S. DF, M.S. 
Irrigation 120.0 1 120.0 
Fertilizer 221.7 2 110.9 
Interaction 35.0 2 17.5 
Error 108.0 24 4.5 
Total 484.7 29 


Assuming that it makes sense to regard the irrigation and fertilizer effects as random 
(Model II), estimate the components of variance. 


C. (88 9.6-9.8) 

1. Regard Problem B-3 as one of mixed type (Model III), with the makes of cars 
fixed and the cities random. (That is, the results will apply only to the particular makes 
selected for the experiment, but there may be a wide variety of possible cities chosen 


for the trials.) Estimate the components of variance, and show that because of the 
high interaction there is not, on this model, a significant effect as between makes of 
cars, 

2. Show that the sums of squares in Eq. (9.7.7) may be written: 


ба = r1 Y (b)- (Y: xy)? — М 

7 jk 
Sy = rl р, (È xg? — r1 È (b) (У хук)? 

их i Jk 
5, = >, Gui?) — ry (У хук)? 

ijk 7 К 

purport to represent breaking strengths 
rand, purchased in three different cities. 


- Calculate the mean squares for cities, for 
Xes, and test for the reality of a city effect 


City 1 2 3 
Boxes 1 2 1 2 3 1 2 3 4 
19 172 | 2.44 227 246 | 136 159 1.73 153 
L80 140 | 211 270 221 | 143 1.50 1.74 14 
L7 20 | 241 236 250 | 148 1.50 1.65 1.64 
1.69 175 | 2.48 236 237 | 155 149 1.58 1.51 
171 195 | 236 216 224 | 153 147 149 1.52 
1.61 | 236 204 225 | 139 163 170 136 
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The boxes may be supposed chosen at random from a large number in each city. 
Treat the cities as random also and estimate the components of variance. Hint: Use 
the computation formulas of Problem 2. 

4. In a Latin square layout, let 7: be the sum of squares of the column totals, 7, 
the S.S. of the row totals and T: the S.S. of the treatment totals. Prove that Sa, S», Sc 
in Eq. (9.8.4) may be written: Sa = Те/т — С, Sy = Тт — С, Se = Тут — G. 

5. The following table gives wheat yields (bushels/acre) for five fertilizer-treatments 
of plots arranged in a Latin square. Test for significance of row, column and treatment 


effects, using Model I. 


Columns 
Rows 
1 2 3 4 5 
1 34(C) 21(A) 52(Е) 24(В) 40(D) 
2 33(B) 45(Е) 47(D) 26(C) 25(A) 
3 31(A) 38(C) 34(B) 39(D) 38(E) 
4 44(Е) 41(D) 32(C) 17(A) 39(B) 
5 33(D) 35(B) 26(A) 46(Е) 35(C) 


Hint: Use the computation method suggested in Problem 4. 


D. ($$ 9.9-9.12) 3 " 
1. Five detergents, lettered А to E, were compared as to number of soiled dinner 


plates washed in a basin before the foam disappeared from the basin. In each block 
there were three basins containing three different detergents, and the three dish-washers 
rotated after washing each plate. Analyze this balanced incomplete block experiment. 
(Data from Scheffé [7].) Test for differences between blocks as well as between 


detergents. 


Block 
Detergent 

1 2 3 4 5 6 7 8 9 10 
A 27 28 30 3l 29 30 
B 26 26 29 30 21 26 
€ 30 34 32 34 31 33 
D 29 33 34 31 33 31 
E 26 24 25 23 24 26 


2. In Problem В-1, calculate a significant difference between pairs of varieties by 
Fisher's method, using 0.05 as the value of æ. , Which pairs appear to be significantly 
different at this level? Hint: To find the £/2 point for t, use one of the approximations 


in § 8.6. 

3. Apply the Ni 
level of significance. Find which 
For v = 32, the upper 5% values of gq for p 


Tespectively. А Р 
4. А a of eight different kinds of alloy steels is to be tested for tensile strength. 


It is expected that the strengths will be of the order of 150,000 Ib/in® and that the standard 
deviation for any one kind will be of the order of 3000 Ib/in?. How many specimens 
should be used for each kind of alloy if we want the error of the first kind not to exceed 


ewman-Keuls procedure to the data of Problem B-1, using the 5% 
pairs are significantly different at this level. Hint: 
— 9, 8, 7 and 6 are 4.70, 4.58, 4.45 and 4.29 
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i i ili jecti i be at least 
5% and if we would like the probability of rejecting the null hypothesis to | 
us if in fact two of the alloys differ by 10,000 Ib/in? or more? (Scheffé [7]). Hint: The 
minimum value of са? satisfying the condition that two a: differ by 10,000 is given by 


putting ол = — 5,000, «s = 5,000, and all the other ох = 0. This makes o? > 


(50,000)/7. By means of charts prepared by Pearson and Hartley [13], suitable values of 
r and ¢ can be read off. Tang's tables are not so convenient for this purpose. 


Ш 
[2] 
[3] 
i 
[6] 


[7] 
[8] 


[9] 
[10] 
11] 
[12] 


[13] 
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Chapter 10 
NON-PARAMETRIC STATISTICAL TESTS 


10.1 Non-Parametric or Distribution-Free Tests Мапу common statistical 
tests are concerned with estimating parameters in a distribution function of 
known or assumed form (for the population), or with testing the significance of 
differences between samples on the hypothesis that they come from such a 
population. Thus the population may be supposed to be normal, with para- 
meters и and c?, and these may be estimated by the sample statistics m and s?, or 
we may use an F-test to decide whether two samples which differ in variance 
may reasonably be presumed to come from normal populations with the same 
variance c?. Such tests, since they deal with population parameters, are called 


parametric. 

There are, however, other tests w 
parameters of the population from wh 
non-parametric, or distribution-free, si 


about the distribution in the parent popu ‹ при 
have to be made—for example, that the observations constituting the sample 


are independent—but these assumptions are considerably weaker than those 
required for the usual parametric tests. The most obvious danger in using such 
tests as the r-test and the F-test is that the underlying assumption of normality 
may not be justified. It is true that these tests appear to be fairly insensitive to 
considerable departures from normality (they are said to be robust tests) but 
nevertheless for some kinds of data it would be rash to assume anything like a 
normal distribution and in such cases non-parametric tests should be used. 
Non-parametric tests are generally simple to apply, not involving much 
computation. In those cases where a parametric test would also be applicable, a 
non-parametric test will naturally be less powerful than the parametric one. 
However, if it is fairly easy to obtain new observations, the lack of power may 
be compensated by increase of the sample size, and the non-parametric test 
appeals just because of its simplicity. In this chapter some of the commoner 
tests will be described. These tests are particularly interesting to students of the 
behavioral sciences because so much of the data in psychology, education, etc. 
is of a kind that can be classified or ranked, but not accurately measured. Good 


general surveys of non-parametric methods may be found in references [1], [2], 


[3] and [17]. 


hich do not require assumptions about the 
ich the sample is drawn. These are called 
nce they are free of specific assumptions 
lation. Some assumptions, of course, 


t of Hypotheses Suppose that the members of a 
n one or other of a set of k categories. These may 
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population can all be placed i 
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be nominal like “male” or “female,” or may be intervals of the domain of i 
measured variable like height. Also suppose that according to a certain hypo me 
Но the probabilities of falling in these classes should be л, л)... Л. 1 A 
hypothesis may be tested by observing the actual frequencies f}, Л .. Ses ө; 
sample of N items, corresponding to the respective classes. The distribution | 
the N sample items among the k classes is multinomial (Appendix А.16), an 

the number of degrees of freedom is k — 1, since the k frequencies are con- 


nected by the linear relation У f; = М. As shown in Appendix A.17, the 
quantity 


„з _ & - Nay 

(10.2.1) х = B AN 

has in the limit the у? distribution with k — 1 d.f. It is assumed that the 
quantities л; are given by the hypothesis Ме and are not estimated from the 
sample. If they have to be estimated, the degrees of freedom are reduced 
(see 8 10.3). 

The quantity Мл; is the expected frequency in the i'^ class for a sample of 
size N, on hypothesis Ho, and may be denoted by ¢;. The proof that x,” is 
approximately distributed like у? uses Stirling's approximation (Appendix A.2) 
for фи! and therefore the 9$; should not be too small. It has been customary to 
require that all the ф; should be at least 5, but some studies [4] suggest that 
values as low as 1 may sometimes be tolerated without causing serious error. In 
practice, the classes with low expected frequency usually come near the ends of 
the distribution, and it is common practice to combine or “pool” the end-classes 
until the $; reach a satisfactory size. The objection to pooling is that some 
important differences between f; and ф; in the end-classes may be hidden by this 
treatment. Since y,? depends on the square of f; — ¢,, the sign of this difference 
is ignored, and yet it might well be of significance for Но if the sign were constant 
over several classes near the end of the table. For this reason, we recommend 
that pooling should be done cautiously, only when any expected frequency 
would otherwise fall close to or below 1. Too much pooling reduces the chance 
of rejecting Hy if it really should be rejected. 

EXAMPLE | The Abbé Mendel, in a now classic 
Observed the shape and color of 
generation progeny of a cross. 


experiment on heredity, 
peas from a number of plants in the first- 


He found that they could be put into four 
groups, as follows: 
Round and yellow 315 
Round and green 108 
Wrinkled and yellow 101 
Wrinkled and green 32 


According to his theory of heredity, 
9:3:3:1. The expected frequencies ф, 
shown in Table 10.1 


these frequencies should be in the ratio 
for a total of 556 should therefore be aS 
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TABLE 10.1 
fi á, figi [0° | i — Ф 
315 312.75 2.25 5.06 0.016 
108 104.25 3.75 14.06 0.135 
101 104.25 —3.25 10.56 0.101 
32 34.75 —2.75 7.56 0.218 
556 556 0 0.470 


From the data, у,2 = 0.470, with 3 d.f. (since here, К = 4). The probability 
of a value of y? as great as this is about 0.92, so that the agreement of theory 
and experiment is very good. Considerably larger disagreement might be 
expected, even when Ho is true. 

Very occasionally, one encounters values of 3,2 so low that the corresponding 
probabilities are as high as 0.99. When these are not due to mistakes in calcu- 
lation, it may be suspected that the observations are not really random. The 
hypothesis Ho should not, of course, be rejected merely because of an agreement 
that is **too good to be true," but this kind of agreement might well be a ground 


for critical reappraisal of the data. 


10.3 The Chi-Square Test of Goodness of Fit An observed frequency 
distribution in a sample may often, on general theoretical grounds, be supposed 
to arise from a true binomial, Poisson, normal, or some other known type of 
distribution in the population. This hypothesis may be tested by comparing the 
Observed frequencies in various classes with those which would be given by the 
assumed theoretical distribution. Usually, however, the parameters of this 
distribution will not be known from prior considerations but will have to be 
estimated from the sample. It may be shown (see for example [5]) that if s 
Parameters are estimated by the method of maximum likelihood, the limiting 
distribution of y,? is that of y wihk—s-1 d.f. Each additional parameter 
estimated from the sample introduces in effect a linear restriction on the variates 
2, namely, (f, = $)/(9)', whose squares are added to produce y,?, and so 
reduces the degrees of freedom by 1. The estimators used for the parameters need 
not be the maximum likelihood ones, as long as they are asymptotically normal 
and asymptotically most efficient (see $ 5.6). 


EXAMPLE 2 Rutherford and Geiger (Phil. Mag. 20, 1910, p. 698) obtained 
the following distribution of the number (x) of a-particles emitted from a disc 
in 7.5 sec. 

Assuming that the distribution is Poisson with parameter и, the maximum 
likelihood estimator of jr is the arithmetic mean X, which is 3.870. The calculated 
frequency $ for any given x is Ne" ^f*|x! with д = 3.870 and N = 2608. The 
last two values of $ in Table 10.2 have been pooled to give a total greater than 1. 
For 13 classes, with one parameter estimated from the sample, the number of 
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degrees of freedom is 11 and the probability P of a value of у? as great as 12.99 
is about 0.30. The hypothesis that the distribution in the population is Poisson is 
therefore not rejected by this test. " 


TABLE 10.2 
x f ¢ Q — 4)*/¢ 
0 57 54.40 0.124 
1 203 210.52 0.269 
2 383 407.36 1.457 
3 525 525.50 0.001 
4 532 508.42 1.094 
5 408 393.52 0,533 
6 273 253.82 1.450 
7 139 140.32 0.012 
8 45 67.88 7.13 
9 2 29.19 0.164 
10 10 11.30 0,150 
T 4 3.97 0.000 
12 2 1.28 0.022 
>13 o 0.52] 1-80 
2608 2608.00 12.99 


—— A Ln EE MEE 


* 10.4 The Power of the Chi-Square Test The power function of the x^ test 
cannot be computed unless a specific alternative hypothesis H, is considered. 
We might for instance suppose in Example 2 that the distribution, instead of 
being Poisson, is really binomial, and then we could find the probability of 
rejecting Hy when it should be rejected. Another alternative hypothesis might 


be that the observations are individually Poisson, but with means that vary in 
some systematic way. 


For very large samples, the chi-s 
that the distribution follows some si 
from this law will tend to show upi 
test is not very sensitive. There is 
which also depends on 7?, and w 
square test. In a Poisson populati 


quare test will usually reject the hypothesis 
mple assumed law, because small variations 
n so large a sample. With small samples the 
а parametric test of the Poisson distribution 
hich is more powerful than the ordinary chi- 
on the expectation and the variance are equal, 
and in a sample of size N from such a population the ratio of ns?[m (where т 
and s? are the sample mean and variance respectively and n = М — 1) is dis- 
tributed as y? with n degrees of freedom. In i 
ance to mean is 0.95, and ns?/m is 2476. 

approximation to у? is adequate (§ 4.6), a 
value as low as this with 2607 d f. is ab 


significantly lower than the mean, and we might be inclined to reject the 
hypothesis of a Poisson distribution on account of this test, whereas the 
ordinary chi-square test would lead to acceptance. 


For such a large value, the normal 
nd we find that the probability of à 
out 0.033. The variance is therefore 
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10.5 The Chi-Square Test for a Grouped Distribution If a measured variate 
X, which may be supposed to have a continuous distribution in the population, 
is grouped into classes in a sample, the expected frequencies corresponding to 
these classes may be calculated if we are prepared to make some assumptions 
about the population. Generally, we assume that the population distribution 
follows some relatively simple and plausible law, such as the normal law or one 
of the other Pearson types, with parameters that are estimated from the sample 
itself. The agreement between the observed and calculated frequencies may then 
be tested by the chi-square test. The number of degrees of freedom is k — s — 1, 
where k is the number of classes in the sample (after pooling if necessary) and s 
is the number of parameters estimated from the sample. (One further degree of 
freedom is lost because of the forced agreement between the total frequencies, 


calculated and observed.) 


EXAMPLE 3 The data of Table 2.2 may perhaps arise from a normal distri- 
bution of weights in the population sampled (eight-year-old Glasgow girls). 
If they do so, unbiased estimates of the expectation и and the variance c? for this 
distribution are provided by the statistics Кү and k which were calculated in 
85.10, namely, 47.71 Ib and 33.341b?. In order to find the expected frequencies we 
need to obtain the standardized z-values corresponding to the class boundaries, 


these values being given by 
х.-и  x,— 47.71 


gos Je и 5.774 
the normal law gives the probability of a value not 


For each z a table of 
greater than z, i.e., 


(10.5.2) (z) -Í plu) du 
E 
TABLE 10.3 
Class Observed m De) ф = NADE) 
Boundary (xe) | Frequency (fi) 
E (— =) 0.0000 25 
E г 1 —2,808 0.0025 
35.5 14 —2.115 0.0172 143 
39.5 56 —1422 0.0775 603 
43.5 172 —0.729 0.2330 155.5 
47.5 245 —0.0367 0.4854 2524 
51.5 263 0.656 0.7441 258.7 
55.5 156 1.349 0.9113 1672 
59.5 67 2.042 0.9774 681 
63.5 23 2.734 0.9969 17.5 
67.5 3 3421 0.9997 e 
(со) (20) 1.0000 
1000 1000.0 
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The difference between successive values of d(z), denoted in Table 10.3 
by AdX(z), are the probabilities, for a random item, of falling in the corres- 
ponding classes, so that the expected frequencies are given by 


(10.5.3) Ф = NA (z) 


These are calculated in Table 10.3. 

The first class includes all values of z from — co up to — 2.808; the last class 
includes all values from 2.734 up to co (although actually no values were 
Observed beyond 3.427). From the columns for f; and ф; we obtain 


X. = 6.78 with 10-3 — 7 d.f. 


There is probably no real need to pool, although many writers would advocate 
pooling the first two and the last two classes. If this is done Xs? becomes 4.82 


ither procedure turns out to be about 


sample we find g, — 0.114 and 92 = 
normal population, the standard errors of g, and д», found from Eqs. (8.18.4) 
and (8.18.5), are 0.077 and 0.154 respectively, so that the observed values differ 

i i dard errors. The probabilities of 


à 0.14 and 0.50, and therefore, even 
by these tests. the assumption of normality is justified. 


Like the chi-square test, this test is one of 
distribution and an assumed theoretical one, 
ve distribution function rather than on the 
S. It may also be used to test whether two 
tded as coming from the same population. 

For the one-sample test, suppose that Но specifies a distribution function 


n ility 1. The test function is 
the maximum) of t iation 
of Sy(x) from F(x): ) of the absolute deviati 


(10.6.2) Dy = Lub. |5,0) — РО) 
(x) 
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and the usefulness of the test depends on the fact that the distribution of Dy 
does not depend on the form of F(x), as long as F(x) is continuous. We can 
therefore take F(x) = x, 0 x < 1. 

It was proved by Kolmogorov, and the proof was simplified by Feller [6], 
that for any given 2 > 0, 
(10.6.3) lim P(N!2D, > 2) = L(A) 


No 


where 
v п+1 „= 21242 
LA) 22 Y, (-1)"*!e 
n=1 


Thus for 2 = 1.36, L(A) = 0.05. The asymptotic probability that Dy exceeds 
1.36N 1/2 is therefore 0.05. 

Critical values for Dy, for small values of N, were calculated by Massey [7], 
and a table is given in Appendix B.6. This table gives the values which are 
exceeded with the given probabilities and so corresponds to the upper tail of the 
distribution of D. For values of N larger than 35 the asymptotic distribution of 
Eq. (10.6.3) may be used. This is given in the last line of the table. 


EXAMPLE 4 (Miller [8]) Can the following five numbers be regarded as a 
random choice from the interval 0 to 1: 0.52, 0.65, 0.13, 0.71, 0.58? 

Here S(x) is a step function with steps of equal height at the observed values 
of x (see Figure 45). The graph of F(x) is a straight line from the origin to (1, 1). 


Fic. 45 KOLMOGOROV TEST 


The maximum absolute deviation is 0.32, so that the probability is more than 
20.0 that this value of Ds would be exceeded if Но were true. There is therefore 
No reason to reject Ho. 

For N = 5 there is a probability 0.05 that D > 0.565. We should therefore 
expect that in about 95% of trials a random sample of 5 from a uniform distri- 
bution on the interval (0, 1) will give a step function lying inside the band 


bounded by F(x) = x + 0.565. 
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Corresponding to an observed step function and a given N, we can form a 
confidence belt, by drawing the curves for Sy(x) + Dy, where Dy is the critical 
value for the given N and a given level of significance. Any theoretical F(x) which 
lies wholly within the belt will not be rejected by the data, at this level. 


EXAMPLES In Table 10.4, X is the logarithm of soil resistance (in ohms) at 
а certain depth and К is the cumulative frequency of observations. 


TaBLE 10.4 

x k Sx(x) 

1 0 0 
1.699 2 0.056 
2 11 0.306 
2.699 18 0.500 
19 0.528 
3.699 22 0.611 
4 23 0.639 
4.477 27 0.750 
S 30 0.833 
5.699 34 0.944 
6 36 1.000 

a = ы —_ 


For N = 36 we may take Dy = 0.23 for a 95% confidence belt. The belt is 


drawn in Figure 46, Any F(x) which lies wholly within the belt could be 
accepted with 95% confidence. 


95% 
Confidence 
d Belt for 
eo F(x) 


1 172 273 374 45 5 576 
Log of Resistance x—> 
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* 10.7 The Power of the Kolmogorov Test 
when the hypothetical distribution is complet 
testing whether a distribution is normal wit! 


This test is correctly used only 
ely specified, as, for instance, in 
h given mean and variance. In 
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practice, when normality is tested, the mean and variance are generally estimated 
from the sample itself. The effect of this on the X! test is merely to reduce the 
degrees of freedom, but the effect on the Kolmogorov test is not precisely known. 
It may be expected that the general effect will be to reduce the critical level of D, 
so that the use of the tabulated values in such cases will be conservative. 

In order to obtain the power of the Kolmogorov test, we need to have an 
alternative hypothesis H, to the hypothesis Ho under examination. If Но 
states that the population distribution function is Fo(x) and H, states that it is 
F,(x) and if the maximum absolute difference between these is Ó, then, as 
Massey [7] has shown, the power of the test is at least 


1 Ф(25%' + 2Dy) + PQN"? — 2Dy) 


where Dy is the critical value corresponding to the level of significance х. The 
actual power is likely to be considerably greater than this. 

The power of the y? test in general is not known, but in some cases where 
comparison with the Kolmogorov test is possible, it appears that the latter is 
much the more powerful of the two. The least maximum absolute deviation of 
the true distribution function F,(x) from an assumed distribution function 
Fo(x), which will lead to rejection of the latter with probability 0.50, has been 
calculated for both tests, at the 5% and 1% significance levels, and is smaller for 
the Kolmogorov test by a factor of nearly 2 (for N between 200 and 2000). 

The Kolmogorov test is not applicable to discrete variates, whereas the x? 
test can be used for these. In other respects the former test seems to have 


considerable advantages. 


10.8 The Kolmogorov-Smirnov Test for Two Samples This test is con- 
cerned with the agreement between two sets of observed values, and the null 
hypothesis is that the two samples come from populations with the same 
distribution function F(x). The test statistic is 


(10.8.1) Dyn = Lu.b. 5409 — 80) 
=) 


where m and п are the sample sizes and S, (x) has the same meaning as before. e 
Tf it is desired to test the null hypothesis against the alternative hypothesis 

that the first sample comes from a population in which F(x) is greater than it is 

for the same x in the second population difference, we should use the actual 


S,(x) — S,(x) instead of its absolute value. | 
" The ane distribution of D was worked out by Smirnoy, who 


showed that, as long as F(x) is continuous, 
(10.8.2) lim P(N!?D,, 2 4) = L(A) 
m,n- oo 
where L(A) is defined as in Eq. (10.6.3), N = mn[(m * n), and itis supposed that 
m and n both tend to со in such a Way that the ratio m/n is finite. 
A table of probabilities that D < k[n, for the case m — n has been compiled 
by Massey [9] and is the basis for the following table of critical values. The null 
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hypothesis is rejected at significance level < if, for samples of size л, the 
maximum difference between the cumulative frequencies for any x is k or more. 


ТАВЕЕ 10.5 

a = 0.05 а = 0.01 
п k n k n k n k 
4 4 19 9 5 5 19 10 
5 5 20 9 6 6 20 11 
6 3 21 9 1 6 21 11 
7 6 22 9 8 y 22 11 
8 6 23 10 9 7 23 11 
9 6 24 10 10 8 24 12 
10 7 25 10 11 8 25 12 
11 7 26 10 12 8 26 12 
12 7 27 10 13 9 27 12 
13 7 28 11 14 9 28 13 
14 8 29 11 15 9 29 13 
15 8 30 11 16 10 30 13 
16 8 35 12 17 10 31 13 
17 8 40 13 18 10 32 13 
18 9 


For large values of m and n, the values of 2 іп Eq. (2) which would justify 


rejection of the null hypothesis at level of significance « are given in Table 10.6, 
calculated by Smirnov [10]. 


TaBLE 10.6 


а | 0.10 0.05 0.025 0.01 0.005 0.001 
А | 122 136 148 1.63 1.73 1.95 


EXAMPLE 6 Suppose that in tw 


o samples of sizes 55 and 60 respectively 
we find on drawing the cumulative step functions S,(x) and S,(x) that the maxi- 


mum absolute deviation is 0.25. Then N12 D, = [(55)(60)/115]*/2(0.25) 
= 1.34. The probability of a value as great as this is a little more than 0.05, so 


that the difference is not quite large enough to reject the null hypothesis at the 
0.05 level. 


10.9 The Sign Test for Paired Samples The ordinary t-test for the signifi- 
cance of an observed effect in paired samples (88.9) assumes that all the paired 
differences can be regarded as independently and normally distributed with a 
common variance. Sometimes the pairs are observed under widely different 
conditions and the assumptions of normality and of common variance seem 
unwarranted. If so, the sign test may be used. This test is very simple to apply 
and merely assumes that the median of the population of differences is и, so that 
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the probability that an observed d; > и is the same as the probability that 
d, < y. 

The null hypothesis is that д = 0 and the alternative hypothesis (for a one- 
tailed test) is that д > 0, or that < 0. For a two-tailed test the alternative is 
that [д | > 0. 

On the null hypothesis, the expected numbers of positive signs and of negative 
signs among the differences іп a sample of № pairs will be N/2. The sampling 
distribution of the number of positive (or negative) signs will be binomial with 
0 = 4. A table of cumulative binomial probabilities for 0 = 3 is included in the 
Appendix, Table B.7. This gives, for № between 5 and 25 inclusive, the proba- 
bilities of occurrence of r or fewer successes, where r < N/2. For the two-tailed 
test, the probabilities should be doubled. 

In practice, we let r be the number of less frequent signs among the differences 
d; (= ху; — х»), so that the condition г < N/2 is satisfied. If any observed 
differences happen to be exactly zero, they are not counted, and the sample 
size is correspondingly reduced. 

EXAMPLE 7 The data in Table 10.7 represent yields in bushels for two 


Aand B, each pair of trees being planted near together under 


varieties of apples, 
moisture, etc. The separate pairs are, however, 


similar conditions of soil, 
scattered over various localities. 


TABLE 10.7 
x1(A) x2(B) ха — хә | x1 (х + 1) 

13 16 —3 —4 
12 11 1 0 
10 8 2 
6 6 Ex: 

13 12 1 0 
15 15 0 =l 
19 14 5 4 
10 9 1 0 
11 8 3 2 
11 11 0 —1 
13 13 0 —1 
9 10 —1 —2 
14 12 2 1 
12 11 1 0 
12 9 3 2 


— х, there are two minus signs in 11 non- 
or fewer minus signs is 0.0327, so that the 
that A's median yield is certainly not lower 
the probability is doubled, and 
he 5% level. 


In the column of differences, X; 
zero items. The probability of two 
difference is significant, if we assume 
than B's. If the difference could be either way, 
the observed difference would not be significant at t 
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When, as in this case, we are dealing with a type of measurement which has a 
well-defined unit and a zero, we can employ the sign test to decide whether or 
not the true difference reaches a certain value. Thus, in Table 10.7, if we add 
one bushel to each of the x, values and re-compute the differences, we get 
Column 4, which has six minus signs and five plus signs. The value of r is now 
5 and the corresponding probability is 0.500. The difference between the yields 
of A and В is therefore not as great as one bushel, in favour of А. 

For values of N larger than 25, the normal approximation to the binomial 
may be used. The probability of r or fewer successes is approximately Ф(2), 
where 

2r+1—N 
(10.9.1) n= кш 

The power-efficiency of the sign test is about 95 % for М = 6, but diminishes 
as N increases to an asymptotic value of 2/n = 63%. This means that the sign 
test has about the same power for a sample of size 100, say, as the most powerful 
test against the same alternative for a sample of size 63. The most power- 
ful test would in fact be the Student r-test, provided that the assumptions for 
this test are met. The sign test has the advantage that it can be used in circum- 
stances where the t-test is not applicable. For samples of size 13 the efficiency 
is still about 75%, so that there is comparatively little loss of power in using 


the simpler sign test for samples of moderate size, even though a t-test could 
legitimately be used. 


10.10 The Wilcoxon Signed-Rank Test This is another test used on matched 
pairs, [16], more powerful than the sign test because it gives more weight to large 
numerical differences between the members of a pair than to small differences. 
The 2N subjects are divided into № pairs, each pair as evenly matched as possible. 
If the effect of a difference of treatments is to be investigated, the choice as to 
which member of any pair has treatment 4 and which has treatment B is made 


The observed differences dj = xy, — 
are ranked in increasing order of absolut 
computed for all the differences of like 
of these two rank-sums (one for Positive 
4; = 0 are not counted. 

Since the variate is supposed continuous, t 
will sometimes happen because of the limited ас 
more of the d; have the same magnitude they are given a rank which is the average 
of the ranks they would have had if they had differed slightly. Thus if the three 
numerically lowest values of d; happened to be — 1, — 1 and 1, they would all be 
given rank 2, which is the mean of ranks 1,2 and 3. (The signs аге disregarded.) 


X3; (where x, refers to А and x, to В) 
€ magnitude and the sum of the ranks is 
sign. The test statistic T is the smaller 
d; and one for negative 4). Pairs with 


ies should occur only rarely, but 
curacy of measurement. If two or 
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On the null hypothesis, the expected values of the two rank-sums would be 
equal. If the positive rank-sum is the smaller, and is equal to or less than the 
value given for the appropriate N in Table 10.8, the null hypothesis will be re- 
jected at the corresponding level of significance о, in favour of the alternative 
hypothesis that и > 0. If the negative rank-sum is the smaller, the alternative 
will be that u < 0. If a two-tailed test is required, the alternative being that 
|Д > 0, the given levels of significance should be doubled. 


TABLE 10.8“ 

N æ = .025 | «=.01 a = .005 
6 0 - = 
7 2 0 = 
8 4 2 0 
9 6 3 2 

10 8 5 3 

11 11 7 5 

12 14 10 7 

13 17 13 10 

14 21 16 13 

15 25 20 16 

16 30 24 20 

17 35 28 23 
18 40 33 28 

19 46 38 32 

20 52 43 38 

21 59 49 43 

22 66 56 49 

23 73 62 55 

24 81 69 61 

25 89 77 68 


a Adapted from Table I of reference [16] with the kind permission of the author, 


F. Wilcoxon, and the publishers, American Cyanamid Co. 


For larger values of N, T is approximately normally distributed with mean and 


variance given by 


NN +1) 
и pU ace 
(10.10.1) — 
в? = ММ +1) 


This means that [|T — ММ + 1)/4| — 3/2 is approximately a standard т 
variate. Thus at the 0.025 significance level, for which z = 196, with N = 25, 
we find T = 162 — 1.96(1381)"? = 89.1, in agreement with the last line of 


Table 10.8. 
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EXAMPLE 8 For the data of Table 10.7, the ranks of x, — x; (excluding 
the zero values) are as shown below: 


d [=8 159 1 Sagara i 
rank | 936311393 6} 3 


3 
9 


The sum of ranks for the two negative d; is 12 and for the nine positive d; 
is 54. Therefore Т = 12 and N = 12. The hypothesis that the expectation of d; 
is positive is therefore acceptable at the 2.5% level. The hypothesis that the 
expectation is not zero is acceptable at the 5% level. The latter decision (based 
on the two-tailed test) is the one that would normally be taken unless there is 
good reason to believe, before the data are obtained, that if there is any difference 
it can only be in one direction. 

The power-efficiency of the Wilcoxon test is remarkably high. Asympto- 


tically, it is З/л = 95.5%, as compared with the t-test, in circumstances where 
both tests would be applicable. 


* 10.11 The Walsh Test This is a test with а 


the Wilcoxon signed-rank test, but depending on the averages of pairs of 
differences. The differences d; are arranged in increasing order, taking account 
of sign. The null hypothesis is that the median и of all these differences is zero. 
The alternative (two-tailed) hypothesis is that this median is not zero. 

The test statistics used are various combinations of the differences. Thus, 
for N = 5, with a two-tailed test, we should reject Ну at the level « = 0.125 if 
either }(d, + d.) < Oor4(d, + 4) > 0. We should reject at the level « = 0.062 
if either d; < O or d, > 0. If we felt in advance that и was bound to be negative 
if not zero, we could reject H, at the level 0.062 if 4(d, + ds) < 0 or at the level 
0.031 if 45 < 0. Table B.8 in the Appendix, from Walsh [11], gives for values 


of N from 5 to 15 the various tests which may be applied at the significance levels 
indicated, for both one-tailed and two-tailed tests. 


Some of these tests are equivalent to t 
others are not. The efficiency of the tests is, 
and is nowhere below 87.5 


ssumptions similar to those for 


he Wilcoxon signed-rank test, but 


for the most part, from 95 to 99 а 
% (for the first test when N = 10, one-tailed). 


EXAMPLE 9 Can the following seven observations, arranged in order of 
size, be considered as coming from populations with the common median 0.5? 


—2.5, —1.5, —13, —0.1, 0.4, 0.7, 0.8 


If we subtract 0.5 from each of these values, the null hypothesis will be that 
и = 0. The values are 

d; 
—1.8 


d, 
—0.6 


d, 


d, | ГА 
0.3 


— 3.0 | –2.0 


= 01 


AF- 
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For М = 7, there are eight tests altogether, summarized below: 


Sig. level (о) " Tests 
0.109 тах[4;, 3(d4 + d;)] = —0.1, min[d;, 3(d; + d4)] = —1.8 
0.047 тах[4,, 4(45 + d;)] = 0.2, тіп[4,, 1(d, + d3)] = —2.4 
0.031 (ав + d;) = 0:25 Ка, + 4.) = —2.5 
0.016 d; 203 d, 2-30 


Only the first test, at level ж = 0.109, leads to rejection of Но, since the value 
—0.1 is negative. At the 0.047 level we should accept Но. 

Using the Wilcoxon test on the same data, we have T — 5, N — 7, which 
would lead us to accept Но at any level up to 0.05. This does not contradict the 
Walsh test, but the latter is more informative. 


10.12 The Mann-Whitney U-test This is a test of the null hypothesis that 


two independent samples А and В come from populations « and р with the same 


distribution. The alternative (one-tailed) hypothesis is that the variate values in 


population « are stochastically larger than those in р (or, of course, smaller). 
This means that if a is any item from х and b any item from £, the probability 
that a > b (or in the other case that a < b) is greater than 0.5. If this is so, the 
“bulk” of population « has larger (or smaller) variate values than the bulk of 
population fj. As before, a two-tailed test may be used, the alternative hypothesis 
being that P(a > b) is not 0.5. 

Let us choose for sample А the one wit 
If М, and №, are the sample sizes and N 
samples in increasing order and then find th 
two samples separately. The sum A, + R2 


this fact serves as a check. 
The test statistic is given by the smaller of the two quantities: 


h the smaller size, if the sizes differ. 
= М, + N2, we rank the combined 
e sum of ranks, R, and R3, for the 
must be equal to N(N + 1)/2, and 


на, Ed р, 


О = №№, + 
(10.12.1) 
И = №№ = U 


An equivalent test using R, was first proposed by Wilcoxon, but Mann and 
Whitney gave a more complete treatment, with more extensive tables. 

This quantity U is equal to the number of times that an item in A precedes 
in the ranking an item in В. U’ is the number of times that an item in B precedes 
an item in A. If P(a > b) is large, most items in В will have lower ranks than 
most items in A and U will be small. The smallness of U determines whether the 


null hypothesis should be rejected. 
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EXAMPLE lO The values of x for two samples are as shown: 


Sample А (№ = 8) Sample B (№ = 9) 

xi Rank xe Rank 
15.2 7 11.5 3 
8.6 1 12.6 5 
9.3 2 19.4 I3 
14.4 6 21,3 14 
15.6 8 325 17 
11.8 4 18.6 12 
16,3 9 17.0 10 
17.8 11 23.4 15 
29.6 16 
Ri = 48 R: = 105 


Here М = 17, $№ММ + 1) = 153 = К, + Ry. 

From Eq. (1), U = 60, U' = 72 — 60 = 12. 

The first item in B (with rank 3) precedes six items in A, the second item precedes 
five items in A, and the seventh item (with rank 10) precedes one item in A. 
None of the other items in B precedes anything in A. The total of precedences 
is 12, agreeing with the calculation of U'. For values of N, and №, which are 
moderately large (say 9 or more) the sampling distribution of И (or U^) is 
approximately normal, with mean and variance given by 


№№, 
в 


o2 МММ +1) 
12 


(10.12.2) 


This implies that the variate 


v= - - 
(10.12.3) ИЯ jwin en ar- ПМ] М 


is approximately а standard normal variate. 


For small values of №, and N,, special tables due to Mann and Whitney [12] 


must be used. These give the probabilities of a value of U less than or equal to 
that observed. For М, > 9, Table B.9 in the appendix may be used. This gives 
for selected significance levels, and for selected values of N, and N,, critical 
values of U. Observed values of U (or U^) less than or equal to the tabular value 
are cause for rejection of Hg at the level quoted. The level should be doubled for 
a two-tailed test. 

In the example above, N, = 8, N, = 9, U' = 12. The critical values for 
a = .05 and .01 аге 18 and 11 respectively. The observed U' is therefore signi- 
ficant at the 0.05 level, and almost significant at the 0.01 level, for a one-tailed test. 
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The normal approximation gives и = 36, a? = 108, z = —2.26, corres- 
ponding to a probability of 0.01. The sample В apparently comes from a 
population f with on the whole significantly higher x-values than population х, 
at about the 1% level of significance. 

If, before obtaining the samples, we admitted the possibility that either 
population might have on the whole higher values than the other, we should 
use a two-tailed test and say that « and В differ at the 2% level of significance. 

When ties occur they are treated as in the Wilcoxon test. The effect of ties 
is to reduce somewhat the value of c? in the normal approximation. If there are 
t observations tied for a particular rank and if T = (t? — 1)/12, the corrected 


c? is given by 


E. 
(10.12.4) Е: N,N2 |“ er] 


=мл=)1 12 


the sum being taken over all groups of tied observations. In most cases the 
correction makes little practical difference. 

If applied to data which could be analyzed by the parametric t-test, the Mann- 
Whitney test has a high power-efficiency, close to 95% and asymptotically 
equal to 3/z. For some distributions it is even superior to the t-test in its power 


to reject Ho. 


10.13 Tests of Randomness It is sometimes desirable to test whether a 
series of observations can be regarded as random. The residuals in a time series, 
after removing a trend, may be so tested, for example. According to von Mises, 
the criterion as to whether a series is random or not is that the relative fre- 
quencies in any sub-series of the given series shall be the same as in the original 
series, providing that the series is very long and that the sub-series is picked out 
by some pre-assigned rule. The sub-series could consist, for instance, of every 
third term, or every term corresponding to a prime number, but the rule would 
obviously have to be independent of the nature of the terms picked. In a series 
consisting of zeros and ones, the rule could not be to pick all the zeros. 

This criterion is clearly not а practical one for testing the randomness of a 
given finite series. Various tests are actually used in such a case. One is to 
determine the relative frequencies of different kinds of terms. In the series of 
digits of the decimal expansion of 7 (3.14159 . . .), now known to 10,000 places, 
one can count the numbers of 0°, Is, 2’s, etc. If the distribution were random, 
one would expect equal numbers of each digit. The actual distribution of 10,001 


digits is as shown in the following table: 


TABLE 10.9 


Digit 0 1 2 3 4 5 6 7 8 9 
961 1008 1000 1001 1011 1031 1026 1000 953 1010 


Frequency 
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The x? test for agreement between the observed and calculated values gives 
Xs? = 6.21 for nine d.f., so that P = 0.7. The hypothesis ofa random distribu- 
tion of the digits is certainly not to be rejected, as far as this test goes. . 

The frequency of pairs of digits can also be compared with expectation, or, 
in sets of four consecutive digits, the frequencies of four or three of a kind, two 
pairs, etc. (the so-called “poker” test), can be used. The "gap" test uses the 
average separation between zeros. All these tests are normally made on sets of 
random numbers produced by some mechanical process, and intended to be used 
for randomizing in experimental work. It will of course occasionally happen that 
some sub-group of random numbers will fail to pass a particular test. This group 
should not be used by itself, but may be quite satisfactory as part of a larger 
group. 


* 10.14 Runs Up and Down Several tests of randomness are based on the 
occurrence of runs in a series. Suppose the observed series consists of numbers 
Xi» X2 Хз... Ху, and consider the signs of the differences d, = x4 — X, 


Ф = хз — xy... dy, = Xy — ху-1. The sequence of N — | signs may look 
something like this: 


Sit+—-+----+-44-..,, 


The plus signs correspond to runs up in the ori 
x), the minus signs correspond to runs down 
tion there are runs up of length 2, 1, 


and l. In this notation three consec 
length 2. 


ginal series (increasing values of 
(decreasing values). In the illustra- 
1, and 2, and runs down of length 1, 4, | 
utive increasing x terms give a run up of 


Tests of randomness have been suggested based on the total number of runs 
(r), whether up or down, on the number of plus signs (A) in the sequence S, and 
on various other properties of the sequence. Moore and Wallis [13] have des- 
cribed such tests, and Levene [14] has discussed their power function. 

We may assume that the original observations are all distinct. 
secutive items happen to be equal, we must suppose that with mo 
measurement they would differ and the differ: 
plus or minus. The run lengths are reckon 
corresponding probabilities calculated. 

The total number of permutations of t 
The number producing exactly k positive di 


If two con- 
re accurate 
ence would be equally likely to be 
ed on both suppositions, and the 


he N numbers x,, x... xy is N! 
fferences is 


k 
(10.14.1) Ф = (А ет is 
i-o 

The probability of exactly k positive differences is th 
expression holds for the probability of k ne 
k is given by 

Na! КФК) N—1 
(10.14.2) E(k) = У SAMO „ 


k=0 


2 


erefore ф„(А)/ №! The same 
gative differences. The expectation of 
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as is obvious from symmetry. The variance of k is 


N+1 
12 


(10.14.3) ә V(k) = 


and so tends to 


1 
5М+1 
zero аз N > оо. For N > 12, the distribution of К is approximately normal. In 
using the normal approximation, the correction for continuity should be applied 


6 
The kurtosis of the distribution turns out to be —= 


= 1 1 
by diminishing the observed value of k EE Ou by 5 This is equivalent to 
taking as a standard normal variate the quantity 
3 1/2 
10.14. z=+(— 2k- N *1|-1 
(10.14.4) «(x (| |-1) 


EXAMPLE 11 Ina time-series of sweet potato production in the United States 
over the years 1868-1937, it was found that the 69 differences were positive in 
37 cases and negative in 32. With N — 70, we find z = +0.822. The probability 
of a value greater numerically than this is 0.41, so that the hypothesis of random- 
ness can be accepted. There is no evidence, as far as this test goes, of a trend in 
the series. If there is one it is swamped by the extent of the random variations. 

A test may also be based on the total number of runs up and down. If r is 
this number, and if we include the runs at the beginning and end, we find 


2N -1 
Е =— 
(10.14.5) жй 
И) 7799 


Asymptotically, r is normally distributed. 
The expected frequencies for runs of given length may also be calculated. 


If E(r,) is the expected number of runs of length p exactly, and if E(r’,) is the 
expected number of lengths p or more, 
5N +1 
Еп) = ро 
11N — 14 
(10.14.6) | Е) == 


SQ AN- M 
Е(г'з) = 50 _ 


The y? test may be used for comparing the observed and expected MX a 
the distribution of y,? is not quite that of Pearson's x°. For N > 12 an the 
three classes suggested above ( f = 1, 2 and ? or more), the са Xs e 
be multiplied by 6/7 and referred to the y^ distribution for two degrees О 


freedom, at least as an approximation. 
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EXAMPLE 12 In a certain time series of 36 observations, the number of 
positive differences was 18 and the number of negative ones 17. Thetotal number 
of runs up and down (ғ) was 25; and the observed values for гү, ғ, and г’; were 
16, 8, and 1 respectively. 

Here E(r) = 23.67, V(r) = 6.08, c(r) = 2.47. The observed r is clearly not 
significantly different from E(r), on the basis of the normal approximation. 

The expectations of гү, ғ and r’ are 15.08, 6.37 and 2.22 respectively. The 
value of y,? is 1.143, and 6/7 of this is 0.98. For 2 d.f., this is not significant. The 
tests agree in permitting us to accept the hypothesis that the given series is 
random. 

Levene [14] has considered the power of tests based оп k and г. The null 
hypothesis Но is that the observations are random. The alternative hypothesis 
H, may be that there is a linear trend in the observations, ог perhaps that there 
is a cyclical trend. For detecting a linear trend, the test with k is much more 
powerful than that with r, but it appears to be less powerful for certain cyclical 
trends. Since in the limit, k and r have a joint normal distribution, and are 
uncorrelated under Но, the statistic 


[к – Е], [r- EP. _ 
(10.14.7) —We * wo ~ х2 


is approximately а у? variate with two degrees of freedom. 


EXAMPLE 13 An artificial upward linear trend was added to the time series 
of Example 12. This increased the number of positive differences to 21, and 
reduced the number of negative ones to 14. The total number of runs (r) was 
reduced to 21. With the continuity correction, k — E(k) — 3, V(k) — 37/12, 
r — E(r) = –10, И») = 547/90, so that the expression in Eq. (7) is 108/37 
+ 845/1094 = 2.92 + 0.77 = 3.69. For2 d.f., P = 0.17, which indicates that 
the added trend is not significant even at the 0.1 level. It may be seen that the 
main contribution to y,? arises from К rather than from r, thus verifying that the 
former is more sensitive than the latter to linear trends. 


* 10.15 Jonckheere's Test This is designed to test the null hypothesis that 
several samples are randomly drawn from the same population, when. the 
alternative hypothesis specifies a certain rank ordering of the populations. The 
ordinary F-test may be used to test this null hypothesis, but the alternative does 
not involve any particular rank ordering of the populations. 

If there are only two samples we can, of course, use the Mann-Whitney test, 
or, if appropriate, a one-tailed t-test, the alternative hypothesis being, say, that 
Hy > Hz. Jonckheere's test may be used with k samples. One application is to 
time-series, when on each occasion the observed event may be one of k possi- 
bilities. The null hypothesis would be that each of these Occurs at random, and 
the alternative hypothesis that the different types of event tend to occur in a 
certain definite order. 
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For convenience we will suppose that the k samples are all of size r (although 
it is not necessary to do so). If the 1% sample comes from a population with 
distribution function F,(x), the null hypothesis is that the Р(х) are all the same 
function of x, against the alternative that there is an ordering of the populations 


such that 
(10.15.1) F (x) < F(x) <... < Fux) 
for all x. This condition will be satisfied if there is a real additive treatment effect, 
different for each sample. We may express the alternative hypothesis as 
F(x) < F,(x) for all x, where i = 1,2... 1, аа] =1+4...№ 

Let xp, be the т" item in the і" sample, and ху, the п“ item in the j™ sample 
(m,n = 1,2...r). Also let Pim jn = 1 if Xim < xj, and 0 if Xim > xj, and 
define p;; by the relation 


(10.15.2) p, 7 Y, Y ртт 


The greater the differences between the distribution functions Р(х) and Ех), 
the larger will tend to be the value of ру. If we then define S by 
(10.15.3) S -2Y р; - BKK — pr? 

i<j 


the statistic S may be used for testing Но against Н. A large value of S will lead 


to the rejection of Ho. 
The following example is given by Jonckheere [15]. There are four measure- 


ments on each of four samples: 


TABLE 10.10 
MERI нии SENS 
I II ш IV 
„—_—— —— 
19 21 40 49 
20 61 99 110 
60 80 100 151 
130 129 149 160 
5125 72.15 97.00 117.50 


— 


note that the values 19 and 20 are each less 


ideri d II, we 
радвай Башы as I, and 130 is less than none. 


than four values in II, 60 is less than three values іп I 


Therefore, i 
рә =4+4+3=1 


lues of p;; and so obtain 
In the same way, e valu Pij 


S=2(11 +12+13 +1 + 12 + 12) – 96 = 46. 
i hat the probability that S 2 46 is 
From the tables, Appendix В.10, we find t 
0.0168, which would suggest rejection of Ho. The usual F-test on the same data 


we can calculate the other fiv 
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gives a probability of 0.346, which would lead to acceptance of Hp. The F-test, 
however, has a much wider variety of alternative hypotheses than the Jonckheere 
um quantity S is really a measure of the agreement of the ranking of all the 
observations with the ranking that they would have if those in each separate 
sample were tied. The ranks in the example above, along with the tied ranks, 
are as shown in the Table 10.11, where for each sample the first column gives the 
actual rank among all 16 observations and the second column the ranks that 
would be allotted if all the observations in a batch were tied. As usual when 
dealing with ties, the rank is the arithmetic mean of the ranks that would apply 
if the ties were broken: Thus 2 is the mean of 1, 2, 3 and 4 (see $ 11.16). 


TABLE 10.11 
I п ш IV 
1 2 3 61 4 101 5 14 
2 24 7 6} 9 10j 11 14 
6 2} 8 64 10 10} 15 14 
13 2 12 64 14 10} 16 14} 


For each pair of observations we may now allot a score of 1 if they are in the 
same order on both rankings, — 1 if they are in opposite orders, 
tied on either ranking. The sum of these scores is the quantity S, and it is easily 
checked that S — 46. The first observation in I and the first in II, for instance, 
contribute 1, since 1 and 3 are in the same order as 23 and 61. 


This quantity 
S is used in calculating rank correlation by Kendall's method (see 8811.14 
and 11.17). 


or 0 if they are 


The statistic S has a symmetrical distribution with expectation zero and 
N? 
variance о? = ТЕ {2% +3 – (2r + 3)/k}, where № 


S/o is approximately normal, especiall 
subtracting 1 from S before dividin 
following: 


— kr. For large samples, 


y if a continuity correction is applied by 
g by c. A better approximation is the 


v 1/2 
15.4 Ieee ш 
(10. ) lz hie Em sl h 


i.e., it has the Student-r distribution with v de 


grees of freedom. The quantity v 
depends on the kurtosis of the distribution of 


5 and is given by 


+ 
(10.15.5) уч UM — 
N°(6N? + 15N + 10) — kr*(6r? + 15r + 10) 


In the example, о = 21.42 and the approximate z v. 
which corresponds to a probability 0.018. The value of y t 


alue is 45/с = 2.10, 
urns out to be 36 and 
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1, = 2.21. The probability for a value of t exceeding 2.21 is 0.017 which is very 
close to the correct value. 

When there are only two samples, S reduces to г? — 2U, or r° — 2U', where 
U is the statistic used in the Mann-Whitney test ($ 10.12). 


PROBLEMS 


A. (88 10.1-10.5) 

1. The French naturalist Buffon (1708-1788) once tossed a coin 4040 times and 
obtained 2048 heads and 1992 tails. Show that this result is not at all surprising with a 
good coin. Hint: Find the probability of a discrepancy from the expected result at 


least as great as this, using the chi-square test. 
2. Over a period of time, the numbers of aircraft accidents that occurred on the 


different days of the week were noted, with the following result: 
Day M Tu WwW Th F $ Sun 


fo 
Is there good reaso 
of the week ? | : 
x is the number of 5's or 6’s observed in a throw with 


3. In the following table, i A : : 
five dice. Is this result of 243 throws consistent with the hypothesis that the dice are 


16 8 12 1 9 14 14 
n to doubt that an accident is equally likely to occur on any day 


true? 
x 0 1 2 3 4 5 
fo 23 90 81 30 19 0 
Hint: Р t lasses. 
kv pv rdinary deck and counted the number of 


4. A student dealt 26 cards from an o : umber 
honor cards in the hand dealt (А, К, О, J, and 10 counting as honors) He did this 50 


times, with the following result: 


sin s е m$» g 9 0 Ш S 


д | 00 02 09 27 60 96 112 96 60 27 09 02 00 00 


Would you reject the hypot 


Hint: The expected frequency of x honor са 


M, = J / (22), giving the theoretical frequencies fe. 


hesis that the cards were well shuffled between each kc 
rds, if the hypothesis is true, is 50(20) 


Pool the first four and the last five 


each of 100 jars and subjected to a 
(x) after three hours was counted 


classes. 

5. In an insecticide test, 
standard dose of insecticide. The nu 
for each jar: 


20 insects were put into 
mber surviving 


274 INTRODUCTION TO STATISTICAL INFERENCE 


Does this distribution appear to be binomial? Hint: If each insect has the same chance 
9 of surviving, the distribution will be binomial. Estimate @ from the relation 200 = ©, 
and calculate the theoretical frequencies. The first two and the last two frequencies 
should be pooled. Note that the last theoretical frequency is for x — 9 or more. 

6. Samples of 50 balls were taken repeatedly from a mixture of 100 red balls and 
1100 white ones, and the number (x) of red balls in each sample was noted. The balls 


were returned and well mixed after each sampling. In 300 trials the following values of 
x were observed. 


x 0 1 2 3 4 5 6 7 8 9 10 or more 


fo 1 16 36 48 62 51 41 22 18 5 0 


Test the agreement of this distribution with a Poisson distribution of parameter ш = 12. 
- Calculate X (the mean of x) for the data of Problem 6, and fit the observations 
with a Poisson distribution of parameter x. Test the agreement now. Hint: When ГА 
is estimated from the samples, the degrees of freedom are reduced by 1. 
8. A classical example (originally given by von Bortkiewicz) of the distribution of 
rare events is that of the deaths of Prussian cavalrymen from the kicks of horses during 


the 20 years 1875-1894. The frequency distribution of such deaths in 10 army corps, 
рег corps per annum, was 


x 0 1 2 3 4 


№ 109 65 22 3 1 


Fit а Poisson distribution and test the goodness of fit. 


9. The following table Bives a distribution of lengths of time (in seconds) of 
telephone calls at a certain exchange: 


Time Number of Calls 
0- 99 1 
100 — 199 28 
200 – 299 88 
300 – 399 180 
400 — 499 247 
500 - 599 260 
600 — 699 133 
700 — 799 42 
800 — 899 11 
900 — 999 5 
995 


The mean and standard deviation аге 477,3 sec and 145.7 sec respectively. Fita normal 
curve to the data and test the goodness of fit. 


10. Ina study of plant disease (spotted wilt of tomatoes) the numbers (x) of diseased 
plants were counted in each of 160 grou 


zroups of plants. Each group contained nine plants, 
evenly spaced, so that x could take integral values from 0 to 9 inclusive. The following 
distribution was found: 


& 0 1 2 3 4 5 6 7 8 


№ 36 48 38 23 10 3 1 1 0 


Assuming that the probability of being diseased is constant (0), fit the distribution 
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with a binomial, estimating 0 from the sam; i 

} " : ple. Test the agreement by the chi-square 
nd. using three different procedures for pooling—(a) the last three beris, 
( 2) the last four, (c) the last fiye. (Note that, in this example, wider pooling tends io 
disguise departures from the theoretical distribution). 


В. (88 10.6-10.8) 

b 1. The cumulative frequencies Fo corresponding to given values of x (upper class 
oundaries) in a sampling experiment, are shown in the following table. The theoretical 

cumulative frequencies F: from a certain normal population are also given: 


Upper Class Boundary (x) Fo Fe 
20.5 2 0.6 
22.5 9 4.5 
24.5 24 20.9 
26.5 73 68.5 
28.5 162 161.7 
30.5 300 287.6 
32.5 402 401.3 
34.5 469 471.5 
36.5 505 500.8 
38.5 510 509.2 
40.5 511 511.0 


that it is reasonable to accept the hypothesis that the 
I distribution corresponding to the column Fe. 

2. The following table gives cumulative frequencies of correct responses to a psycho- 
logical test for (a) a group of 24 normal subjects, (b) a group of 24 schizophrenic 
subjects, The test required the perception of groupings in a design exposed to the 
subject for a variable time (Kaswan, British Journ. Psych., 1958, p. 131). 


F(Normal) F(Schizophrenic) 
2 


Use the Kolmogorov test to show 
population sampled had the norma 


Exposure Time 


0.01 sec 4 

0.04 10 4 

0.1 13 5 

0.25 17 10 

0.75 21 16 

5.0 24 21 

10.0 24 24 
Use the Kolmogorov-Smirnov two-sample test to determine whether there is a significant 

difference between the two groups. 

bly regarded as coming from a uniform 


_ 3. Can the following sample be reasona 
distribution on the interval (35, 70): 36, 42, 44, 50, 64, 58, 56, 50, 37, 48, 52, 63, 
57, 43, 39, 42, 47, 61, 53, 58? Use the Kolmogorov test. Hint: Calculate theoretical 


relative cumulative frequencies, x/35, at the given values of x. | р Р 
ercentage change in systolic pressure In 


4. Are the following observations (on p : systolic 
dogs under experimental conditions) consistent with a normal distribution of mean 
16.0 and variance 30.0? 


20.6, 
17.8, 26.9, 


5. Since F(x) in the Kolmog 
probability of two identical values of x. However, 
limited accuracy, ties do occur. What would be in gene 


of Dy? 


11.6, 7.5, 10.5, 13.9, 16.2, 14.8 

13.5, 20.1, 22.5, 11.1, 16.7 
orov test is assumed to be continuous, there is zero 
since measurements are made with 
ral the effect of this on the value 
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C. (88 10.9-10.12) . | | i 

1. The observed values x: and хә for two paired samples are given. Is it reasonab e 
to assume that no difference exists between the medians of the two populations from 
which these samples were taken? Use the sign test. 


xi 15 19 31 36 10 11 19 15 10 16 


x2 19 30 26 8 10 6 17 13 22 8 


2. For nine animals, tested under control conditions and experimental conditions, 
the following values of a measured variable were observed: 


Animal No. 1 2 3 4 5 6 7 8 9 


Control 21 24 26 32 55 82 46 55 88 


Experimental| 18 9 23 26 82 199 42 30 62 


Test whether a significant difference exist: 
signed-ranks test. 


3. The following table gives scores of a group of engineering students in (a) mathe- 
matics, (b) graphics. Use the signed-ranks test to determine whether the median scores 
of such students differ significantly in the two subjects. 


$ between the medians, using the Wilcoxon 


Student No. | Maths Graphics | Sudent No. Maths Graphics 
1 66 51 16 86 72 
2 20 51 17 65 73 
3 66 45 18 33 69 
4 73 77 19 42 51 
5 59 68 20 66 66 
6 58 51 21 59 82 
y 37 50 22 57 74 
8 85 81 23 27 44 
9 57 66 24 55 65 

10 69 65 25 61 65 
11 63 54 26 86 71 
12 75 53 27 52 65 
13 87 73 28 79 62 
14 34 40 29 63 62 
15 67 73 30 75 64 


times (in minutes) are given: 


Cat No. 1 2 3 4 
Time 45 43 33 25 
10 11 12 13 14 


Rabbit No. | 12 3 4 5 6 4 8 9 


Time 39 33 30 30: 28 28 23 22 223 00 17 16 16 в 


NON-PARAMETRIC STATISTICAL TESTS 277 


Lo ee e ees are random samples from populations with the same 
ution, as against the alternative h i i i 
tay ieee ыз pothesis that the cat times are stochastically 

5. In the following table the variable is the number of trials required by a rat to 
learn a new pattern of behavior when placed in a new situation. The experimental 
group of rats had been trained in a certain way; the control group had not. Test 
whether the previous training significantly affects the ability to learn. 


Experimental Group 78 64 75 45 82 54 71 


Control Group 110 70 53 51 62 93 106 88 67 72 


D. (88 10.13-10.15) 


1. The annual marriage rates per 1000 of population in the United States for 1885, 


1890, 1895... 1950 аге: 9.2, 9.0, 8.9, 9.3, 10.0, 10.3, 10.0, 12.0, 10.3, 9.2, 10.4, 
12.1, 12.2, 11.1. Does there appear to bea significant upward trend? (Use the number 


of positive differences and also the total number of runs.) 
2. Apply the approximate chi-square test on runs of a given length to test the ran- 


domness of the time series in Problem D-1. 
3. A student opened a set of mathematical tables with the entries blocked off in 


sets of five, and, starting at random, added the five terminal digits in each block of five 
numbers. Going consecutively through 50 blocks, he obtained the following values: 


18, 30, 33, 25, 28, 22, 23, 17 
25, 18, 22, 13, 17, 18, 22, 25, 27, 30 
28, 32, 24, 27, 20, 22, 15, 18, 20, 23 
12, 15, 27, 30, 33, 25, 28, 20, 23, 17 
25, 18, 20, 13, 17, 18, 22, 33, 27, 30 


12, 15, 


ess, using runs up and down. 
ds, Iowa State College Press, 1956) has given the 


nts of four different fats (in grams) absorbed by 
ts being used for each fat. 


Test the sequence for randomn 

4. Snedecor (Statistical Methoi 
following table showing the amou 
doughnuts, six batches of doughnu' 


Fat 
A B С р 
Ehe —À 
164 178 175 155 
172 191 193 166 
168 197 178 149 
177 182 171 164 
195 177 176 168 
156 185 163 170 


een the means by Jonckheere's method. 


Test for a significant difference betw 
Е g order of their means. Count 4 for ties in 


Hint: First order the samples in increasing ‹ 

computing ру. Use the normal approximation. 
5. The following table is supposed to represent the $ 

different groups of teachers on a certain personality test. 


ordering of the groups? 


cores of samples from three 
Does there seem to be a real 
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Group I Group И | Group III 


96 82 115 
128 124 149 
83 132 166 
61 135 147 
101 109 129 
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Chapter 11 
DISTRIBUTIONS OF PAIRS OF VARIATES 


11.1 The Classical Regression Problem for a Population We now propose to 
investigate some measures of the relationship between two random variables 
(variates), X and Y, both of which are capable of measurement on each member 
of a given population. For convenience we shall think of them as distributed 
continuously, but the extension to discrete distributions will usually be obvious 
—a matter of replacing probability densities by probabilities and integrals by 
sums. 

In general, the joint probability th 
the value of Y lies between x and x 
Y + dyis given by f(x, y) dx dy. The probability density fo 
of Y)is 


at for a particular member of the population 
+ dx and the value of Y lies between у and 
r X alone (regardless 


(11.1.1) = | f(x, y) dy 

and that for Y alone is 

(11.1.2) h(y) -| f(x, у) ах 

The probability density of Y, for a given value x of X, is 

(11.1.3) fol) =f Dl 

and similarly for f(x|y). The expectation of У, for given X, is defined as 
(11.1.4) n= &apo - || УГО) dy 


It is, of course, a function of the given value x of X. The graph of nx as а 
function of x is called the true regression curve of Y on X. If the regression 1s 
ith the equation 


linear the graph is a straight line, WI 


(11.1.5) п. = % + Вх 
The parameter f) is the true regression coefficient of Yon X, and is the slope of 
the straight line. The other parameter « represents the intercept on the axis of Y. 


(See Fig. 47). J 
The variance of Y, for given X, is similarly defined by 


oye? = VOTO -[ © — п) dy 
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(11.1.6) 
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This is also in general а function of x, although in some circumstances it turns 
out to be independent of x. It is the variance of Y for those members of the 
population whose X-values lie in a thin strip of width dx—these members are 
said to form an X-array. They are represented by dots in Figure 47. 


Fic. 47 LINEAR REGRESSION IN A POPULATION 


The regression parameters and В may be expressed in terms of the moments 
of the distributions of X and Y. Let us define the two means by 


“= | sox as = | | Xf(x, y) dy dx 
(11.1.7) NS Noc 


Hy -Í h(y)y dy -Í | f(x, у) dx dy 


and the two variances and the covariance by 


ex? -| g(x)(x — их)? dx 


(11.1.8) сү? -f A(y\(y — uy? dy 


Oxy -Í | Дх, y) — uxY(y — uy) dy ах 
From Eqs. (3) and (4) it follows that 


(11.1.9) gCon, -f f(x, у) dy 


so that, on writing ny = х +Bx and integrating, 


(11.1.10) | @ Bue dx | [` MfG, y) dy dx 
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or, using Eq. (7), 
(11.1.11) „ t Bux = by 

If we multiply Eq. (9) by x before integrating, we obtain 


(11.112) Г (ах + Bx?) dx =|" Г ху/(х, y) dy dx 


which may be written 
(11.1.13) ору + Вих = пху 


where ш”, х is the second moment about the origin for Х, and луу is the product 
moment about the origin for X and Y. Multiplying Eq. (11) by их and subtract- 


ing it from Eq. (13), we obtain 
Bax — их?) = туу — Ахиу 


which is equivalent to 


(11.1.14) Brox? = Oxy 
We have, therefore, я 
(11.1.15) В = oxylox? = хубуЇбх 


where руу = сху/(схоу), the Pearson coefficient of correlation between X and Y. 


From Eq. (11), 
(11.1.16) а = ру — Вих 
ession line are now expressed іп terms of 


so that the two parameters of the regr | 1 
f Y and Y. Using these expressions, the 


the means, variances and covariance о 
equation of the line may be written 
Ne = Hy = B — Hx) 


X — Их 


(11.1.17) 


= рхүбү ox 
which indicates that the line passes through the point with coordinates (их, Hy). 
A similar equation may be obtained by interchanging Х and У. If č, is the 
expectation of Х, for a given value у of У, 
yr by 
(11.1.18) £, — их = Рхүбх oy 


This line also passes through the point (их, Hy) but in general it does s 
coincide with the first line. There are therefore two regression lines, one o 


on X and one of X on Y, given respectively by Eqs. (17) and (18). The first line 
represents the expectation of Y for a given X, and the second line the expec- 


tation of Y for a given Y. 


282 INTRODUCTION TO STATISTICAL INFERENCE 11.2 


The variance of Y, for a given X, may vary from one value of X to another, 
but we can define a weighted average of the variances in the different X-arrays 
by the relation › 


11.1.19) Gye =| Cy g(x) dx 
( -0 


the variance сур? for a given x being weighted with the probability density for 
this value of x. The quantity су, 2 is called the variance of estimate of Y. Itisa 
measure of the average variability of Y around the regression line of Y on X. 
Using Eqs. (3) and (6) and noting that 
(у= т)? = (y — uy — B(x — np 
= = np  f'G — ду)? — 2B(y — мух — uy) 
we obtain, from Eq. (19), 


(11.1.20) сүг? -[ ЦВ Го — д)? + Bx — py)? 


—2B(y — uy)(x — py) ]f(x, y) dx dy 
= оү? + Bay? — 2Boxy. 
From Ва. (15), Bax? = oyy?/oy? 


(11.1.21) 


= Boxy = pyy*oy*, so that 
ey? = oy*(1— Pxy’) 

It is clear from this relation, since Cy, 
that py,? cannot exceed 1. 


(11.1.22) 


2 and oy? are necessarily non-negative, 
In other words, for all possible distributions of X and 


—1 руу <1 
As previously indicated (8 2.1 3), pxy is а measure of the degree of relationship 
between X and Y. If Y and Y are independent, руу = 0. If Y is precisely pro- 


portional to X, so that all the Y values lie on the straight regression line, then 
Gye = 0 and py, = +1. These are the extreme cases. 


11.2 The Bivariate Normal Surface An important special case of a two- 
variate distribution is that with a joint probability density: 


(11.2.1) f(x, y) = Ке-® 
where 

(11.2.2) K'z 2nayoy(1 — p?)!2 
and 


тз» ef «a neces 
+ [2d — p] 
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Here p has been written for pyy. The quantity О is a quadratic form in the 
standardized variates, 


(11.2.4) co B I e 
Ox бу 


so that the probability density in terms of the variates 2 and v is 

2 2 

z? -- v^ — 2pzv 
11.2.5 = (ол) 101 — p?) ** [| 
( ) glz, v) = Qr)" 1 — p) "7 exp| 5—2) 


This represents a surface known as the bivariate normal surface, and pictured 
(in a truncated form) in Figure 48. It is bell-shaped, asymptotic in all directions 


Fic. 48 BIVARIATE NORMAL SURFACE 


ctions parallel to the z-» plane are ellipses and sections 
he g-v plane are normal curves. 


tely are given by integrating Eq. (5). Thus, 


to the z-v plane. Se 
parallel to either the g-z plane or t 
The distributions of z and v separa 


(11.2.6) fo -[ g(z, v) dv = (2л) 


= -1/25-v2/2 
(11.2.7) h(v) -| g(z, v) dz = Qm е 
utions. The probability density for v, 


Er 
1026—2212 


These are both standard normal distrib 
for a given z, is 


(v — 2А 


(11.2.8) КОРЕ жет =[2л(1— prie | 55 202) 
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The expectation of v for a given 2 is therefore 


Nz -| vg(v|z) dv: 


= pz 
This represents a straight line through the origin with slope p. The regression is 
therefore linear, and this holds also for the second regression line, with equation 


d, = pv 
Each array of v's has, for given z, the variance 


сыг? -| (о — pz)'g(v|z) dv 


and, on carrying out the integration, this becomes 
(11.2.9) Gy.” —1— р? 


Fic. 49 HORIZONTAL SECTION OF BIVARIATE NORMAL SURFACE 


Transforming back to the original variates X and Y, we obtain 
(11.2.10) By? =0y (1 — р?) 


and the variance is therefore independent of х. The weighted average ay,” is 
then the same as oy,,”, for all x. А distribution with this property is called 
"homoscedastic" (from Greek words meaning “equal scattering"). A similar 
property holds for the y-arrays of X for given values of y. 

In the z-v plane the two regression lines have the same slope, one with 
respect to Oz and the other with respect to Ov. (Figure 49). A section of the 
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bivariate normal surface by a horizontal plane, g — const., is an ellipse with 
equation 


(11.2.11) ВЕ p 


If this ellipse is drawn on the z-v plane, the tangents at the points where it is 
cut by the regression lines are parallel to the axes. 

The bivariate normal distribution occupies a central position in the theory 
of two-variate distributions, similar to that of the normal distribution for a single 
variate. Most of the early classical work of Karl Pearson, Galton and others, on 
regression and correlation, was based on this distribution. 


11.3 Linear Regression as determined from a Sample In practice, there is а 
variety of situations involving two variables. Both X and Y may be random, or 
only one of them, or neither. One variable, for instance, may be the rime, as in a 
time-series of temperatures, stock-market prices, sunspot numbers, etc. Also, in 
many cases, the values of Х are pre-selected instead of being chosen at random, 
as in a physics experiment where conveniently chosen weights are hung on a 
wire or spring and the extension produced is measured. Here neither x nor Y 
is a random variable in the ordinary sense, but both may be subject in different 
degrees to experimental error. The true relation between X and Y isa functional 
one. Obviously, unless Х is a random variable it makes no sense to speak of its 
distribution, expectation or variance, and of course the same holds for Y. | 

In the usual regression problem, Y isa random variable and X may be either 
random or fixed. The classical situation is that in which both are random, a 
sample being selected randomly from а population such as that considered in 
§ 11.1, and the values of X and Y measured on each of the N selected items. We 


assume that for a given value x of X, Y is of the form 


(11.3.1) Y =a fx t 


à К Е 
Where c is normally distributed with mean 0 and variance c?. The relation 


(11.3.2) У=а+ ВХ 


Which really means 
(11.3.3) E(Y|X =x) 2 & + fx 
It is the underlying relationship which 


is someti : tural relation. : 
Sk coa ibi ling fluctuation of Y and by the 


is disturbed, in an actual sample, by the samp 


errors of measurement of both X and Y. . : 
We will asume for the present that X and Y are measured without appre 


Ciable error, so that we need consider only the sampling зше: cro 
the quantity e in Eq. (1). Ifa straight line 15 fitted to the i e у m : s 
and Y, this line will furnish estimates of the parameters о and p А б 7 

Tegression line. In fact, as ме shall see shortly, when the line is fitted by the 


i iased. 
usual “least squares" method these estimates are unbi 
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Suppose the observed pairs of sample values аге (x; y)i-1,2...N. The 
problem is to find the equation of a straight line which will give the “best” 
estimate of Y for a given value x of Y and to find the standard error of this 
estimate. 

The observed sample values, plotted in the x-y plane, form a scatter 
diagram (Fig. 50). The least-squares method of fitting a straight line chooses 
the constants a and 6 of the equation 


(134) - Ye =a + bx 


in such a way that the sum of squares of the deviations of the sample points from 
this line, measured parallel to the y-axis, will be a minimum. In Figure 50 the 


Fic. 50 SAMPLE LINEAR REGRESSION 


deviation of y, from the least-squares line is denoted by e;, and the deviation from 
the true regression line N = а fix by ej 


= ee М 
If = Vie? = У (у -а- bx;)?, the minimum value of S will be given by 
solving the simultaneous equations = =0, = = 0. These may be written 
> " 
after cancelling a factor —2, 


У -а — 6х) =0 
(11.3.5) : 


Ух -a- bx) =0 
which are called the norma] equations of the problem. They can be rearranged as 
Na t Y xib =F у, 


У ха + È xb = Уху 


Solving for a and b, we obtain 


(11.3.6) 


(11.3.7) b = Sx 
552 
(11.3.8) а=у- bx 
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where х = Y xi/N, ў = У ИМ, (N — Us? = Y x? — №, and (N — 1)syy 
= Y x,y; — NXy. Here sy? is the sample variance of X and syy is the sample 


covariance of X and Y. 
The Pearson coefficient of correlation for the sample is defined by 


Sxy Sx 

11.3.9 pe с A 

i ; = SxSy Sy 
so that the equation of the least-squares line may be written, using Eqs. (7) and 


(8), as 
(11.3.10) у ред 90 
5х 


where r stands for ryy- . 
For any given value x of Х, y, provides an estimate of the corresponding value 


of Y. 

All the above argument can be carried through with X and Y interchanged. 
If we wish to find the best straight line for estimating X for a given value y of Y, 
the least squares criterion for the deviations of the sample points from this line, 


measured parallel to the x-axis, will give 


rs = 
(11.3.11) x -£eby-D-uio-» 
TY 
Where 
‚ _ Sxy 
(11.3.12) b ut. 


It may be noted that bb’ = r?°, so that r is the geometric mean of the two slopes 
(one measured from the x-axis, one from the y-axis). The two regression lines 
intersect at the point whose coordinates are (X, y). | | " 

In most practical situations one of the two variates will, for non-statistical 
reasons, be the one we would like to estimate, and this is the one we label Y. It 
is therefore hardly necessary to treat the second regression line in detail. What 
is said in subsequent paragraphs about the regression of Y on X can be applied, 
With minor changes, to the regression of X on Y. 


egression and Correlation Coefficients Calcu- 
ct, the determination of two variances and a 
K(Sxsy). The most convenient formulas 


11.4 Computation of the R 
lation of b and r requires, in effe 
Covariance, since b = syy/sy^ and r = Sxv 
for ungrouped variates are the following: 


муху -УхУу 
(11.4.1) b= Nyx (х) 


N x = у)? 
(11.4.2) =? = [NE E S SIN iy =E] 
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If, for the sake of simplifying the calculations, we put и = (x — xo)/h and 
v = (y — yo)/k, where xo, Уо, В and К are chosen arbitrarily. it will make no 
difference to the value of r if we replace x and У by wand v throughout Eq. (2). 
The value of b, however, obtained by using u and v in Eq. (1) must be multiplied 
by k/h to give the value in terms of x and y. 


EXAMPLE | Specimens of steels containing various percentages of nickel 
were tested for toughness with the following results (X is toughness in arbitrary 
units, Y is percentage of nickel): 


X | 47 50 52 52 54 56 58 59 60 60 62 64 65 66 
У |25 27 28 28 29 3.2 3.2 33 34 35 35 3.6 3.7 3.8 


Suppose it is desired to estimate percentage of nickel from measured toughness 
in further specimens and to estimate th 
It is assumed that a random sample of nickel 
testing, and both Y and Y were measured on 


If weletu = ¥ — 50 and v = 10(У — 30), we get Table 11.1. Then, 


TABLE 11.1 

=й =5 9 25 15 
0 —3 0 9 0 
2 -2 4 4 —4 
2 -2 4 4 —4 
4 —1 16 1 = 
6 2 36 4 12 
8 2 64 4 16 
9 3 81 9 27 
10 4 100 16 40 
10 5 100 25 50 
12 3 144 25 60 
14 6 196 36 84 
15 7 225 49 105 
16 8 256 64 128 

105 29 1235 275 5 


NYw-Yuyvw- 14(525) — 105(29) — 4305 
NYw-(yay = 14(1235) — (105)? — 6265 


му ~ (у о) = 140275) (29? = 3009 
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Since h — 1 and k — 0.1, we have 


4305 
= 0.1 222 = 0.0687 
b= 01-5 = 0.068 
(4305)? 
d ries i 
r? = 6265 х 3009 9 
r = 0.9915 
105 
= 50H 575 
#= 50+. =5 


29 
J =30 +0177 = 3.207 


$0 that for any given x 
Ye = 3.207 + 0.0687(х — 57.5) 


This is the relation required. 


ExAMPLE2 When the sample to be studied is large, it is usually con- 
venient to replace the scatter diagram by a two-way frequency table, the fre- 
quency in any cell of the table being the number of individuals in the sample 
falling within the corresponding class intervals for both X and Y. In Table 
11.2, Y represents the grade achieved on a mental test by applicants for a certain 
type of industrial job, and Y the productive ability of these applicants after 
hiring (measured as a percentage of a certain standard of production). The 
auxiliary variables и and v are here defined as м = (x — 42.5)/5, v = (у — 85)/10. 
The values of x and y shown in the table headings are the centers of the class- 
intervals, The marginal column totals are denoted by Л, (the frequency for a 
given value of u), and the marginal row totals are similarly denoted by f,. 


Clearly, 
YA -YX4-N-260 


Which is also the sum of all the frequencies in all the cells in the main body of the 
table (the part surrounded by a double line). The values of f, are added vertically 


and the values of f, horizontally. | "" 
The rows headed uf, and wf, аге obtained by multiplying each f, by the 


Corresponding и and then multiplying the product again by и. Similarly for the 
Columns headed vf, and v?f,. These rows and columns give the means and 


variances of u and v. 


uf, —17 
zy Us. = 0.06538 
а=) у = 260 


ofo _ 60 _ 02307 
260 
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(№ — 1)5,2 = У uf, — Lg = 735 — = 
= 733.89 
vf, y 3 
(N — 1)s,? =r, -LA = в02 – 00 
= 788.15 
The means and variances of X and Y аге 
X = 42.5 + 5ū = 42.2 
у = 85 + 105 = 87.3 
Sx? = 255,2 = 70.84 


sy? = 1005,2 = 304.3 


There are various methods in use for obtaining the covariance Sw- One method, 
which has the advantage of providing convenient checks on the calculations, is 


TABLE 11.2 
27.5 | 32.5 | 37.5 | 42.5| 47.5| 52.5] 575 
—3 | —2 | —1 rre ГА vf | U | WU 
u2| 7| 28 
ЧЕЧЕТ 
as | 31| 62 
| 44| a3} в 
| o|-1| o 
|| 9 
[140 |-33 | 66 
42 | 126 |-26 | 78 | 
[802 |—17 | 313 | 
изу, | 112 | 126 PE o| 54 | 140 rur pum 
y “ЕТЕТ ТИЛ xs mu 
[| #| =| a] 0} ЕЛЕЛ wlan, * | 
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indicated in the table. The row headed V is obtained by multiplying each cell- 
frequency in a given column by the value of v corresponding to that cell and 
adding along the column. .Thus, for the first column, V = 1(0) + 2(—1) 
+ 2(—2) + 2(—3) = —12. Each value of V is then multiplied by the corres- 
ponding и to give the row headed uV. 

A similar procedure is used for the columns U and vU. For the first row 
О = 2(0) + 3(1) + 2(2) = 7, and for this row v = 4, so that vU = 28. The 


checks are 
EV =} vf, = 60 


yusYu,--17 
Y uV =} vU = 313 
From the method of calculation it is clear that У uV or } 00 gives the same 
Tesult as we might have obtained by multiplying each individual cell-frequency 


by its own и апа v, and adding over all the cells of the table. That is, it gives the 
quantity » fuv which we need in calculating the covariance. In fact, 


N- sw Euv -EAE 


o 
260 


= 313 — (—17) 


= 316.92 
So that syy = 50s,, = 61.18. . И 2 
The regression line of Y on X is given by y, — y = b(x — X), where 


b — 61.18/70.84 — 0.864. ГОН 
The coefficient of correlation between X and Y is given by 


61.18? 
^ (70.84)(304.3) 
= 0.1737 


r 


so that 
r = 0.417 


11.5 Variance about the Regression Line When the values of a and b 
àre chosen according to the equations of (11.3.5), the minimum value of 5 is 
Biven by 


(11.5.1) Ea Ee XDi- y bl- хур 
=Z- + Lr 9 
i Li 


— 2b X (x: — 301-3) 


=(М- DGy? + bs? — 2bsyy) 


292 INTRODUCTION TO STATISTICAL INFERENCE 11.5 


where sy”, sy”, and syy are the sample variances of X and Y and the sample 
covariance, respectively. Using Eq. (11.3.9), we may write this 


(11.5.2) Smin = (М — 1)sy2(1 — 72) 


Which may be compared with (11.1.21) and suggests that Smin/(N — 1) is an 
estimator of су,2. It turns out, however, that a better (unbiased) estimator is 
Swinl(N — 2). 

Corresponding to the sample value х, we have three values of Y, namely, the 
actual sample value y; the estimated value Yei and the true expected value rJ, 
these being connected by the relations 


(11.5.3) y; Shi + & =a + Вх + е, 


= Ya +e; =a + bx; + e; 
Now 


(N — 1)5ху = 9h (i - Ix: — x) 
= ў, у(х; — х) 


since)", (x, = X) = 0. Writing y, = а + fx, + & = a + f(x, — X) + BE + En 
We obtain (N — Ds, = BG – xy + У а(х; — x) since the other terms 
vanish. Therefore (N — Dsxy = BIN — 1)sy? + У а(х, – x) so that, on 
ИГҮ $. 
dividing by (N — 1)s,?, з = B+ У еж — S/N — 152]. We have then 


(11.5.4) 


Б=В-+ье, 
where 
(11.5.5) =у 50-3) 
=EN- Ds, 


(11.5.6) V(b) = V(e) = a? y, Bim 
г (№ — Ds, 


c? 


Т (N= Ds? 

Moreover, if e; is normally distributed for е; 

variance c?/[(N — 1)зу?] 
From Eq. (3), 


(11.5.7) 


ach i, so is b, with expectation В and 


5 — e; = (a — o) + (b — В)х, 
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Also, from the first equation of (11.3.5), У e; = 0, so that, on summing Eq. 
(7) over i and dividing by N, we obtain 


(11.5.8) ‘а-а+(-Вх=Е 
From Eqs. (7) and (4), 
(11.5.9) £j— ej = ё + a(x; — X) 


Now, for any fixed value x, 
у= п=а-а + (6 – В)х 
=8 +e,(x — х) 
1 @-Я@- 3 
DOE (N — Ds? " 


Since the e, are independent, with mean 0 and common variance о?, it follows 
that 


(11.5.10) E(y) =^ 
and 
1 (х= 90 – 8]? 
(11.5.11) Yo) =0° У, [5 + Wa Ds | 
1, œ- 
= ols 3E N- 2) 


i = 2 
Since У (x; — x)? = (N - 1х. 
This expression contains the ип 
The minimum sum of squares Smin» 


known variance c?, which we need to estimate. 
by Eq. (9), may be written 


(1.5.12) Spin =J e? = [e -E 90 9] 


min 


= Vg? — NE? + (N – Desi? — 2% Хаба — 3) 
=} а? — №? -(N- Dsy^ey? 


Now во i sis a standard normal variate, and the sum of N squares 
M PE deir de eed as x° with N degrees of freedom. Also, NP gg 
isa standard normal variate, and its square is distributed as x with 1 af. 
Finally, (N — 1)'/2sxe,/¢ is also a standard normal variate and its square is a y 
Variate with 1 d.f. These last two normal variates are both linear functions of 
the &/c and it is easily verified that е ш мня "i m E 
follo :cher's Theorem ($ 4.7) that Omin/7 i = 
degr us VE вац = is independent of the last two terms on the right-hand 


Side of (12). 
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From the known properties of the chi-square distribution, 


E Smin -N-2 

or 

Smin se igh 
(11.5.13) EXT 
An unbiased estimator of ø? is therefore 

Sis Nd 
(11.5.14) ё? ma = 25901-0) 
and on substituting this for c? in Eq. (11) we obtain for the estimated variance 
of у, 
1, (x-xy 
S15 Wy.) = ex uen 

(11.5.15) (Ye) м+м 


This is а function of х, having its least value when x 

We may be interested in the variance of Y about th 
that is, in the variance of Y- 
variance of y, depends only o 
observation is independent o. 


= Хх. 

e estimated regression line, 
У, for some new assumed value x of X. Since the 
n the N observations already made and the new 
f these, we may write 


це) ИУ = уә) = V(Y|x) + v(y,) 
2 l, @-x 
“+ *w n] 


and an estimate of this variance is given by putting 2°, from Eq. (14), in place of 
а?. The Square root of P(y — Ye) is called the standard error of estimate. 


11.6 Confidence Limits for the Parameters of Regression and for Estimated Y 
Since the slope b of the regression line (on the hypotheses stated above) is a 
normal variate with expectation В and variance TN — 1)sy?], and since an 
independent unbiased estimate of ø? is furnished by 67, with М — 2 degrees of 

2 
freedom, the quantity (b — в) [Est 
with N — 2 d.f. ё 


has the Student-r distribution 


Using Eq. (11.5.14) we can write the 100(1 — 0) % confidence limits for В as 


2(1 — 7?) 12 
(11.6.1) =®+(& ( ) 
В агу? 
where t, is the value of 1 exceeded numerically with Probability æ. This may also 
be written 


b (1 p12 
11.6.2 EI irs 
и 4 P tur (5) 
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+ = 

(№ = Ds 
regression parameter « may be found in a similar manner. However, it is 
seldom important to know «. In most regression problems it is the slope that 


matters. 


r : 1 S 
The variance of a is о? s | and confidence limits for the true 


EXAMPLE 3 For a sample of size 27, we find that b = 0.163, r = 0.582. 
What are the 95% confidence limits for В? 

For 25 d.f., toos = 2.060. Therefore the limits are 

В = 0.163 + 2.060(0.0456) 
= 0.069 and 0.257 

If f = 0 (and therefore р = 0), the quantity '( =з 
distribution with N — 2 d.f. This is sometimes useful in deciding whether an 
Observed value of r differs significantly from 0. 


1/2 
) has the Student-t 


J 


! 


Xx 


1 
ELT FOR ESTIMATE FROM LINEAR REGRESSION 


Fic. 51 CONFIDENCE В 


EXAMPLE 4 For a sample of size 27, suppose that r — 0.348. Is this 


significantly different from zero? " | 
Here t = 1.856, with 25 d.f. The probability of a value numerically as great 


as this is between 0.05 and 0.1, so that at the 5 % level of significance the answer 
to the question is *No." It takes a fairly large value of r to be significant with a 
Sample size as small as 27. . | P 
Since the expectation of Y — Ye is zero and its variance is given by Eq. 
(11.5.16), it follows that 
ENS LS ET 
] | 1/2 


Е 1. «-» 
e [ UN *(N = Ds 
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where 6? is substituted for о?, has the r-distribution with N — 2 d.f. The 
100(1 — 2) 75 confidence limits for Y for a given x are therefore 


(11.6.3) Y 2ycttsy, 
where y, = a + bx and 
1 MI 
2 a2 
ae = | уу) 
Belo dal | 
00, 


The curves bounding the confidence belt аге hyperbolas (Figure 51). For 
large N, sy.” © sy*(1 — r°), and the belt is almost of uniform width. 


11.7 Regression when the Variable X Is Not Random As mentioned in 
$ 11.3, it often happens in practice that the values of X in an experiment are pre- 
selected, so as to be convenient numbers instead of being chosen at random. 
Sometimes, also, the circumstances of observation practically dictate the values 
of X, so that the observer has very little choice in the matter. The problem is 
then not really one concerning two variates, but rather it concerns a single 
variate Y which depends on a mathematical variable x. The assumption is that 


(11.7.1) Y=a+ fix +e, 


where the e, are normal, with expectation zero and a common variance c? for 


all x, and, for different values of x, are independent of one another. With this 
assumption (see $ 11.8) the maximum likelihood estimators of х and B turn out 
to be the a and b of $ 11.3. Since, however, Х is now not a random variable, we 
must understand by 5х? not the estimator of the true variance of Y but merely а 
symbol for the quantity У (x, — X) (N — 1), where x = У АМ. 

As we have seen earlier, when Y and Y are both random there are in general 
two distinct regression lines, one for estimating Y for given values of X, and the 
other for estimating Х for given values of Y. The former is still good when X is 


pre-selected and not random (the least Squares argument of § 11.3 is still valid) 
but.the latter has no meaning in this case. 


It sometimes happens that we would like t 
Y even though X is not random. In testin 
wish to estimate the median lethal dose (t 
from observations made on the proportio: 
doses. The method is to invert the Tegr 


о estimate X for a given value of 
g an insecticide, for example, we may 
he dose that will kill 50 % of the time) 
ns of insects killed with various known 
ession equation y, — a + bx and write 


ye—a 
1.2 X EG 


where y, is the given value of Y (see Figure 51). The 100(1 — x) ^, confidence 
limits for x are x, and x, these being the abscissae of the points where the line 
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У = y, cuts the confidence band. The values of x, and x, can be calculated 
from (11.6.3) by putting y, = а + bx + 15ү, and treating this equation as a 
quadratic in x. The two roots are the required limits, provided that b is suffi- 
ciently large. For values of b too small to be significantly different from zero 
at the level а, it may happen that no finite confidence interval for x exists, but 
this case is not of much practical importance. 


* 11.8 Maximum Likelihood Estimation of the Regression Coefficients We 
consider first the case when X is "fixed" (pre-selected), Y being given by Eq. 
(11.7.1), with the assumptions there stated. If e; denotes the value of e, when 
х= x, i= 1,2... №, the joint distribution of the N values ¢; has the density 


function 
a? 
(11.8.1) gley, 6) = Qno?) 77 ev(- X25) 


orresponding to the fixed values x; be denoted 


Let the observed values of Y c 
. ум), Where 


by y. The joint density function for the V; is ЛО» X2 -- 


(11.8.2) — f(y,, ya... Yn) Фу dY2 ++ дуу = g(£1, £2 - - - En) 61... dey 


The у; are related to the c; by equations 
(11.8.3) yi =a + BX; + Ei i=1,2... 
rmation from the £; to the y; for fixed xj, is equal 


and the Jacobian of the transfo 
to 1. It follows that 


= 0 — Bx; 2 
(118.4) fpr Ya à) Ол)" Бу 25 р M 
kelihood suggests that we choose o and f to 
maximize this function f/(Y1 - - - У»). This is equivalent to minimizing У.О" 
— € — fixi, and оп differentiating partially with respect to x and В and 
equating the derivatives to zero, We arrive at the following equations for the 
maximum likelihood estimators @ and Ё: 


уола Bx) =9 


The principle of maximum li 


(11.8.5) ухи — à - Bx) = 0 
These are identical with the equations of (11.3.5) so that à = a, p = b. We get 


the same result as by the method of least squares. А N 
The above argument does not hold if X is a random variable. If the joint 


Probability density for Y and Y is f(x, y), the likelihood for the observed sampl 
is 


(11.8.6) L = f&r УЛО 32) + ЛЬ Yn) 
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If we transform to new variates U and e by the relations 

X =U 

Y-a- pU +e 


(11.8.7) 


the Jacobian of the transformation is 1, and therefore Eq. (6) becomes 
(11.8.8) L = f(u,, а + Bu, + а, (из, а + Buz + е)... Дик, а + Buy + €n) 


If this splits into two factors, one of which depends on the e, alone and the 
other on the u; alone, we must have 


Хи, а + Bu +) =g(u)-h(e) 


or, equivalently, 


(11.8.9) Јо, y) = gG)* Му — а — Вх) 


If and only if this condition is satisfied, we can feel justified in estimating « and В 
from the distribution А(є) of є alone. When this distribution is normal we get 
back to the equations (5). The condition (9) is satisfied, for example, if the 
population is bivariate normal. Using the standardized variates z and v, we see 
from Eqs. (11.2.5), (11.2.6) and (11.2.8) that 


(11.8.10) 9(2, v) = f(z): g(v|z) 


— pz)? 

= 2n) 1120-2212 20)-12(1 — g2-1/2 в 

[< 10) — р?)-1? exp 5 — 25) 

The first factor is а function of 2 alone and the second of v — pz alone. In these 
variates v — pz is equivalent to Y= = Bx. 

It should be observed that when condition 


(9) is satisfied the regression is 
necessarily linear. For 


(11.8.11) E(Y|X =x) = f y-h(y — a — Bx) dy 
=f -a – Bxh(y — a — fx) dy 
+ @ + Bx) f h(y — а — Bx) dy 
= E(e) + (а + fix) 
since /(є) is a density function. 


The first term can be taken as zero by suitably adjusting а, and we thus have 
the ordinary equation for linear regression. 


11.9 Functional Relation Between Variables Su 
that the variables X and Y, whether “хей” 
mental error. With random variables this err 
due to the underlying probability distributi 


bjectto Error Itoften happens 
or random, are subject to experi- 
or is mixed up with the fluctuation 
on. We will therefore suppose to 
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begin with that, apart from the error, X and Y are fixed and that Y isa linear 
function of X. That is, 


Х=ё+и 
(11.9.1) 

Y=n+v 
where 
(11.9.2) n=at pe 


The quantities č and y are regarded as the true values of the variables, and и and 
vas the errors. We suppose that и and v are uncorrelated with each other, and, 


for any value of £, are distributed normally with means zero and variances o 


and o,? respectively. 
The joint density function for X and Y is then 
Е (x- (уп)? 
(11.9.3) f(x, y) = Олоцо,)' exp| - 262 202. 


v 


The likelihood function for a set of N pairs of observed values (x; Уз) is 


> (x — &)? ЮЭ Qa- ed 


(11.9. L= (2n0,0,) " =|- г 7 = j 


2o, 


So that 
1 
(11.9.5) loggL-C- N log о, — N log o, — 57 2; и — 6" 
1 2 
porn 2c Put 
N + 4 unknown parameters, namely a, В, Cw 


ximum likelihood equations, found by differenti- 
meters and setting the derivatives 


The right-hand side contains 
2, and the N values č; The ma 
ating partially with respect to each of these para 


€qual to zero, are 


(11.9.6) Zoa- BE) =0 
(11.9.7) E-a- Bé) =0 
(11.9.8) Y (x -& = Now? 
(11.9.9) Y o-a- ВЕ)? = Noo? 


(119.10) (x Elo? i-a Bl =O, 1,2... 


From Ед. (8) and (10) we find 
2 
No = Y Ps Qi € Ве) 
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and, on substituting from Eq. (9), 


(11.9.11) В? =; 


Since this relation cannot be supposed to hold іп general, the maximum likeli- 
hood method is not satisfactory unless some further assumption is made. The 


most convenient one to make is that the ratio of c,? to с,? is definitely known. 


If this ratio is denoted by 2, and if we let 0,2 = 02, с,2 = 202, we obtain, 
instead of Eqs. (8) and (9), the one relation 


(11.9.12) 2Nic? = AY Gu — č)? + Y (yı — a – ВЕ)? 
and instead of Eq. (10) 
(11.9.13) Axi — ё) + Ва — BE) = 0 


Substituting from Eq. (13) in Eqs. (6) and (7), 
of terms, that 


(11.9.14) Na+) xf =У у, 
апа 


we find, after some rearrangement 


(11.9.15) 5 xa + Y xf =} xy +50 y) -а Y»n-f È xy) 


By eliminating а from thes 


€ two equations we obtain a quadratic equation in 
В, which reduces to 


(11.9.16) SxyB? + (sy? — s,2)8 — Asyy = 0 


2.2 i s 
where sy”, sy? and Sxy are the variances and covariance of the observed sample 
values x; and у. The estimator of р is 


(11.9.17) B= {sy? — 252 + [(sy? — Asy?)? + 475 ху? 1/2} (25ху) 


The estimator of « is found from E 


q. (14) and that of c? from Eq. (12). It turns 
out that 


N-1 
те, a. 441? 2} 


Unfortunately, аз Lindley [1] has Shown, this is not a consistent estimator of c?. 
It tends in probability, as № increases, to the value 2/2. 

If we use the least squares method, it is no longer 
regression problem, to minimize the sum of squares о 
y-axis. Since the values of Y observed are no longer t 
of X must be taken into consideration. One wayisto 


Correct, as in the classical 
f deviations parallel to the 
he true values, the variance 
minimize the sum of squares 
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of the X deviations, weighted inversely as о, >, plus the sum of squares of the Y 
deviations, weighted inversely as о,2. That is, we minimize 

АЎ, -& +} Ori- a BY 
with respect to a, В and the č; Another method is to weight the squared devi- 


ation of y; from а + fix; with a weight inversely proportional to the variance of 
y; — æ — fx; Since this variance is с,2 + В?0,? = 0,7(4 + ^), the expression 


to be minimized is 

ў (y; - 4— Вх)? 

A+B 

Differentiating partially with respect to « and В (A is supposed known, as in the 
maximum likelihood method), we obtain 
(11.9.19) j= + px 
and 
01920 Ву y- ê- Bx)? ++) Ухо — 2 - fx) =0 
If we imagine the origin shifted to the point (X, У), which will not affect the slope 
of the line, we can put à — 0—from Eq. (19)—and then 
(11.9.21) Ву ол — Ix +A + BYE хог BY) = 


which can be written 
а BY Yaw + PL OP — 0) =0 
With the new origin, Y, х2, Dy? and 3, хр are proportional to spa and 
Sxy respectively, so that 
(11.9.22) (à — B2)sxy + B^ — 48x”) = 0 
he estimator f is therefore the same as that 


likelihood, but now a consistent estimator 
um sum of squares by the number of 


Which is the same as Eq. (16). Т 
furnished by the method of maximum 
of c? is obtainable by dividing the minim! 
degrees of freedom, N — 2. That is, 


1 (и Ё)? 
=з А+ 


== =——= E i p^ i) 
Ma M 


by Eq. (21). Therefore 
M -F z) 


je — Bsxy), by Ea. (22) 


(11.9.23) 
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A. good discussion of the functional relation between variables subject to error 
may be found in reference [2]. 


* 11.10 The Regression Relation Between Variables Subject to Error If Y isa 
random variable, subject also to error, and if X is “fixed,” that is, pre-selected, 
the relations corresponding to Eqs. (11.9.1) and (11.9.2) are 


X=E+u 
(11.10.1) Y=nt+e+v 
n =a + Be 
If (x; у) are the observed pairs of values, 
x = ği +u; 
(11.10.2) Yi =Ni + & +0, 
n; =a + Вё, 


where successive observations are independent, the и; and v, are uncorrelated 


with each other and with the £j, and where и, v, and e, have zero expectations 
and variances c,?, o,?, 0,2, for all i, Then 


(11.10.3) су? =02, of = o? +o? 

The analysis of $ 11.9 then holds with cy? in place of c,?. The only difference 
is that the variance of Y is now partly due to its inherent 
variable and partly due to experimental error. 

If we are interested in estimating the regression of Y on the observed X rather 
than on the true €, we simply treat the Y as being without error. That is, we use 


the technique of 8 11.7, and our estimator of р is the ordinary one obtained in 
$11.3, namely, 


nature as a random 


(11.10.4) Й = ь =?Х% 
5х 
This is obtained as a special case of (11.9.16) when c,? = 0 and therefore 
А о. 


If both X and Y are random variables (the structural situation mentioned in 
§ 11.3), the equations (1) still hold, but 6 is now a random variable with 
expectation и and variance 0:7. Then 7 is also a random variable with expec- 


tation а -- Ви and variance Pas. On the assumption that u, v and e are 
uncorrelated with each other and with č, we have 


(11.10.5) ox? = 62 to, ву? = Bo? +0? + Ui. oxy = Bo? 
The sum 0,2 + с,2 may be denoted by с„? 


› as the two components cannot be 
distinguished. The maximum likelihood tr 


eatment of this case is discussed in 
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reference [3]. Estimators of и, « and В are connected by the relations 
(11.10,6) д=х ё+ЙХ=ў 


but in order to find В it is necessary to assume that either 6,7, C? or the ratio 

А = 0,2/0,2 is known. If 2 is known, В is given by 

" sy? — As? 

(11.10.7) Baret 0125 ——- 
2sxy 


This is the same as Eq. (11.9.17). An estimator of a,’ is given by 
(11.10.8) pe? = ё =(N – p 


11.11 The Method of Grouping А very simple method of fitting a straight 
line, when X and Y are functionally related but subject to error, has been 
suggested by several writers, notably A. Wald and M. S. Bartlett (see references 
[4] and [5]). 

The observed N pairs are ordered, usually by reference to the X values, and 
divided into three groups. À number p is chosen (p € $ and such that Np is 
integral) and then the first Np observations are put in group G;, the last Np in 
group Су, and the remainder in group G,. Wald suggested taking p as near as 
Possible to 1, so that G; was either empty or contained one observation. Bartlett 
Suggested taking p approximately 1, which in general gives greater accuracy. The 
exact value is not very important, but studies [5] indicate that the numbers in 
the three groups G,, G; and G3 should be nearly in the ratio 1 :2:1 for maximum 
efficiency. 

The method consists in plotting the points А and В, which are the centroids 
of the two groups of points С; and G3, and joining AB. The coordinates of А 
and B are the group means (Ху, У 1) and (Xs, ўз). The slope of AB is an estimator 
of В. The line parallel to AB through the over-all mean (X, ӯ), at С, is the line 


required, with equation 


(1.11.1) y-y-2hx-35 
where 
(1.112) в 21 

3 1 


(see Figure 52). 
This quantity Ê is a consisten 
(1) the grouping should be indep 


t estimator of В if two conditions are satisfied : 
endent of the errors of observation, (2) the 
quantity X4 — X, should not approach zero as Мә o. The second condition 
is obviously satisfied if the observations are ordered according to their increasing 
true values £. Unfortunately we do not know the true values and the order of 


the observed values x may not be independent of the errors. 
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The precise conditions for the consistency of В are difficult to satisfy, p 
ticularly if we assume that the error in Х is normally distributed. Theoretica d 
an error of any magnitude whatever is possible, since the range of a norma 
variate is infinite. Practically the probability of an error numerically greater 
than 6 = 40, is negligible. If the values of č corresponding to cumulative 
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probabilities p and | — p are denoted by č, and Či- p» the grouping by observed 
values will be practically the same as grouping by true values, provided that 
scarcely any observed values fall in the intervals 


[£, — 6, © +6] and [51-4 8,81. +8). 


As in § 11.9, we assume that the errors и = 


X — č are independent and 
normal with common variance 0,2, the errors p = 


Y — i are independent and 
normal with common variance 6,7, and и and v are uncorrelated with each other. 
Then it is possible to determine confidence limits for f. 

From Eq. (2), we have 
(11.11.3) G5 = (B — f) = 5 — y. — f(x, — z,) 

= (03 — Виз) — (0, — Ви) 

since y; =n; +v =a + Bo, +v; and x, = či + uy The variance of ӧз 
or i, is o,*/k, where k = Np, and that of i, or i, is o,?/k, so that the variance of 
the right-hand side of Eq. (3) is (0,2 + В?о,2)(2/к). 

For the points in group С 
so that 


(11.11.4) L Di- ў = B(x; — xp? =} (v; — у)? 


+ В? X (u: — i)? — 2B Y (v — 51)(и; — й,) 
and the left-hand side is therefore an estimator of (k — l(c,? + 320,2). The 
corresponding sums over G, and G, are estimators of (N — 2k — 1)(о,2 + р20,2) 


J= Ji = Вх, — х,) = v, = б — plu; — ii) 
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and (k — 1)(0,? + В20,2) respectively. The three sums combined give an 
estimator of (№ — 3)(¢,7 + В?о,2). 
If we write o 


(11.11.5) (N= 3)8,2 = У (х; m x ку » xj +У (х _ жу)? 
Gi б: G3 


su similar expressions for (N — 3? and (N — 3)s,, the estimator of 
o, + В?о,2 is s? — 2s, + B's,” = SCB), say. 

. Since on our assumptions—see Eq. (3) above—the statistic (x4 — ¥,)(B — В) 
1s normally distributed about zero, and since an independent estimator of its 
variance with N — 3 d.f. is 25(В)/К, the quantity 

Gs — XJ — P) 

[25(0)/к]'* 

has the Student- distribution with M — 3 degrees of freedom. If t, is the value 
of t exceeded in absolute value with probability а, the quadratic equation 


on 

k 
Where f) is given by Eq. (2), provides 100(1 — «)% confidence limits for В. If 
В, is the upper 95% limit, a rough estimate of the standard deviation of f is 
given by (В, — B)/to.05- 


EXAMPLE 5 The friction ( 
X ounces. 
х | 234 447 654 86.88 107.5 128.8 149.6 171.0 


y| 34 47 54 68 № № 58 11.0 


(11.11.6) t= 


(11.11.7) 


[s? — 28s., + в25,?] = Gs — я) - BY 


ounces) in a simple machine is y when the load is 


Taking the first two and the last two measurements, We have X, — 34.05, 
Хз = 160.3, у, = 405, уз = 10.3; also X = 97.15, y — 7.15. Therefore 
B= 0.0495, and the line fitted is y = 0.0495х + 2.34. We also find from 
Eq. (5) that 55,2 = 226.8 + 2224.0 + 229.0 = 2679.8, 55,2 = 7.8525, 55, 
= 143.85, so that 5) = 1.5705 — 57.540 + 535.96? = 0.0354. 

The 95% confidence limits for В are given by Eq. (7) with k = 2, t, — 2.571. 
This equation reduces to B? — 0.096618 + 0.002313 = 0, the roots of which 
are B = 0.0437 and 0.0529. 

If we use Wald's method with two groups of four observations each, the 
central group G, is empty and N — 3 in Eq. (5) must be replaced by N E 2. 
We find x, = 55.075; хз = 139.225, y, = 5.075, уз = 9.225, with x and у as 
before. The line fitted js y = 0.0493x + 2.36. Also 6s,” = 4456.5, 652 = 12.48, 
65, = 234.48, so that 500) = 2.08 — 78.160 + 742.15}? = 0.0321 with 


В = 0.0493 
with k = 4, are 0.0430 and 0.0526; this 


The 95% limits given by Eq. (7), e : А 
method, therefore, in spite of using all the observations, gives a slightly less 
reliable value of the slope than the method which uses only the first two and 


last two observations. 
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11.12 The Distribution of the Pearson Correlation Coefficient We have seen 
that if the parent population is uncorrelated (i.e., p — 0) and if X and Y are 
normally distributed, the quantity { 


(1.12.1) ВМ = 2) ( = 2) м2 


has the Student-r distribution with N — 2 d.f. Ши = r2/(1 — г?) = PKN — 2), 
it follows from the result of 58.5 that и is distributed like the ratio of two 
independent 7? variates with 1 and N — 2 d.f. respectively. This in turn means 


(see $ 4.5) that м is a beta-prime variate with parameters 4 and (№ — 2)/2. Its 
density function is therefore 


1 N-2 
(11.12.2) Хи) 2u^?(1 + ЖАШ Е x ) 
The density function for r is obta 


factor 2 arises because, as r goes from 
to +00. Since du/dr = 2r(1 — ғ2)-2 


ined by putting 29(r) dr = f(u) du. The 
—1to 1, и goes from + со to 0 and back 
‚ we have 


1 

a ж 

) is a symmetrical bell-shaped curve for № > 5. Because of 
— 0, and it is easily proved that 


E(r?) = (№ - ^! 

The standard deviation of r is therefore (N — 1)- 2. 

The kurtosis is *-6/(N — 1), which tends to zero as N increases. For very 
large N the distribution is approximately normal. 


It is not necessary to assume a bivariate normal distribution. Provided that 
X and Y are independent and that at least one is a random sample from a 
univariate normal distribution, the distribution of Eq. (3) holds. 

When p is not zero, the exact distribution of r is quite complicated. It was 
first found by Fisher, using an essentially geometrical argument. An analytical 
treatment may be found in [7], and a full discussion by Hotelling [8] is probably 
the last word for some time on this subject. The density function for r is 


(11.12.3) g(r) =(1— ppan] N- 2) 


The graph of g(r 
the symmetry, E(r) 
(11.12.4) 


N-2 
(11.12.5) I(r, р) = — (1 — N-og _ 2) 40/2197) 


where /(pr) is given by 
= du 
11.12.6 Ipras] 90 
( ) pr) [ (cosh и — pr)¥=i 
The integral can be expressed as a rapidly convergent Series, and we obtain 


0 - 2T(N - па — noo. 2-08 
(11.12.7) f(r, р) = Олгу — 3ü- pe S(pr) 
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where 


Li: - — Ly 
(11.12.89 Sor) =1 + Fon 73) * 32 QN - DON +1) 
Tables of the function f(r,p) and of its integral have been prepared by 
Miss F. N. David [9], who included also several charts from which approximate 
confidence limits for р may readily be obtained. 
The distribution of r is far from normal, particularly when p is near +1 or 
—1. Series expressions may be found for the cumulants of this distribution, as 


follows (with n written for N — 1): 


рт+1 9 (pr + 1)? (=) 
о 
№ 


d=’? пу? | 
ка = V(r) = D S РВ 


(1.12.9) a 
= = 3% pm... 
a Жая "EL n? 12n 
к. 6 2 
-—Le-(120^—1-... 
72 "E zí p 


Thus if p = 0.8 and N = 50 we find that y, = —0.71 and y; = 0.82. 


11.13 Fisher’s Transformation Fisher showed that if we transform to a new 
variable z' by the relation 


Leer 
(11.13.1) z' =їапһ^! r = 3108.1 7 


ally distributed with variance 1/(N — 3), whatever 
nables us to assess readily the significance of an 
David's tables. 


then z' is approximately norm 
the value of p. This remark e 
Observed value of r, without having to use 

If Е = хапћ p, it may be proved that: 


Р 1 
(11.13.2) ee) 8 +0(5) 
where n = N — 1. Also 
14-р (5) 
(11.13.3) ИЕ) = + on M 
р? 1 
(11.13.4) n6) = 233 + On 


2 1 
(11.13.5) (2) == +о(5) 
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For p = 0.8 and М = 50, у,(2') = 0.0015 and у,(2') = 0.042. This shows 
clearly, when compared with the values given at the end of $ 11.12, how much 
more nearly normal z' is than r. , u 

From Eq. (3), the variance depends to some extent on p. For p = 0 it is 
approximately 1/(и — 2) and for р = +1 approximately 1/(n — 3/2). The 
Fisher value 1/(и — 2) is therefore reasonably close for any p. 


EXAMPLE 6 For 20 students the correlation coefficient between scores on 
two tests was 0.65. What are the 95% confidence limits for p? 

Assuming a normal distribution for z', the 95 % limits will be z' + 1.96/./17. 
Since the expectation of z’ is { + p/38, the limits for ¢ will be z' — p[38 + 
1.96/./17. Now z' = tanh^! 0.65 = 0.775, but p is unknown. However, since 
p/38 is small, we can substitute for р the sample value 0.65, and this gives 
¢ = 0.775 — 0.017 + 0.475 = 0.283 to 1.233. The corresponding limits for 
р (= tanh 0) are 0.276 to 0.844. Direct reading from the chart in David's tables 
gives 0.28 and 0.84. Appendix B.11 gives a table of r — tanh z'. 

The Fisher transformation is another example of the transformations of 
variables considered in Chapter 4 (8 4.3 
mate constancy of variance and appr 
taking the average of the correlations o 


). It achieves at the same time approxi- 
oximate normality. It is convenient in 


btained from several samples, supposedly 
from the same population. The values of r are transformed to values of z' and 


each is given the weight N — 3, inversely proportional to its variance. The 
weighted mean of the z' is then transformed back to r. 


EXAMPLE 7 Separate samples of sizes 50, 70 and 100 give correlation 
coefficients 0.72, 0.68 and 0.77. What is the best value to use as an average? 
The z' are 0.908, 0.829, and 1.020, The weighted mean is 


gi. 4 
Z = a [47(0.908) 4- 67(0.829) + 97(1.020)] = 0.934 
The corresponding r is 0.733, 


two were rubbed together. 
intermediate in hardness bet 
by taking some standard minerals and labelling them 1, 2 
proper order, but this does i 

If the same individuals are ranked in two ways, say by different judges or 
according to different criteria, the degree of concordance between the two 
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rankings may be of interest. The coefficient of rank correlation is intended to 
measure this concordance. If the two judges agree perfectly in their rankings, 
so that the i" individual in: one ranking is also і" in the other ranking (i = 1, 
2... N), we should expect the coefficient of correlation to be 1; if the judges are 
diametrically opposite, so that the i" individual in onerankingis the(N — i + у» 
in the other, the coefficient of correlation should be —1. Ш the ranks are 
assigned by pure chance we should expect a value of the correlation coefficient 
near zero. The two principal methods of calculating rank correlation agree in 
these extreme cases, but differ in the values assigned to intermediate degrees of 
concordance. 

Spearman's coefficient rs depends on the differences d; = x; — y, between 
the ranks x, and y; of the same individual on the two rankings. In fact, 


(11.14.1) rs =1- 6D d?l[NQV? — 0] 

EXAMPLE 8 Suppose that seven bathing beauties, labelled A to G, were 
ranked by two judges as in the following table: 
TABLE 11.3 

АВСРЕ Е © 


Contestant 

(x) Judge 1 5145376 

(у) Judge 2 5 4 2 5I 67 
| e e eee 


a 

ДР 1 9 4 Q 3* 1 T 

The differences of the ranks are squared in the last row. Here N — 7 and 

2i dj = 20, so that m 

rg =1 m3 = 0.64 
7(48) 

This suggests that the judges agree reasonably well, but the question of signifi- 


cance will be taken up later. | | | -— . 
Kendall's method [6] depends on giving each possible pair of individuals in 


the sample a score of +1 or — 1 according as their ranks are in the same order 
or in the opposite order on the two rankings. Thus, in Example 8 above, A ranks 
above D according to both judges, and so the pair AD gets a score +1. On the 
other hand А is below В on one ranking but above on the other, so the pair АВ 


mW ; 
Bets a score —1. The total number of pairs 15 a = N(N — 1)/2. If S is the 


total score, 


(11.14.2) rk -5/ (5) 


one of the pairs in this way. The same 


It is not necessary to consider every 
: 3 heir natural order and then for each y; 


result is obtained by writing the x; in t 
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counting how many numbers there are to the right of this y; and greater than it. 
Let this number be и; and let P = УХ, п. Then 


4Р 1 
(11.14.3) гк = М 21) 
Rewriting the above table, we have 
TABLE 11.4 
Contestant BAECDGF 
Xi 1 2,3 4 $6 7 
yi 4 3 125 T 5 
" [3343200 


so that P = 15and ry = 60/42 — 1 = 0.43. (The number 4 for y, in the column 
headed B has three numbers to the right which are greater than 4. This gives 
n, = 3.) 

The reason that Eq. (3) gives the same result as Eq. (2) is that only pairs with 
their y; increasing from left to right will give a positive contribution to the score 
(since the y, always increase from left to right). Then P is the total positive 
contribution to S. If the total negative contribution is — О. РО = 
ММ — 1)/2, so that S, which is defined as P — Q, is 2P — N(N —1)2. On 
substituting this value of S in Eq. (2) we arrive at Eq. (3). The modification of 
this procedure due to ties in the ranking will be considered in $11.16. 


* 11.15 Relation Between Rank Correlation and Pearson Correlation Suppose 
we have N individuals, or sample items, which are numbered for identification 
and are given ratings on two attributes Х and Y. These ratings may be ranks or 
numerical values, and for the i" item will be denoted by ху, y; For any pair of 
items, numbered i and j, we Suppose that a score a; j is allotted on attribute Y 
and a score b, on Y. These scores will naturally depend on the values хапа x; or 
y; and у» but we merely require that ау = —ал and therefore that a; = 0 


when i = j, with a similar condition on the b у. We can then define a generalized 
correlation coefficient rg by the equation 


= ny (a; jbi 4) 
Qi ai Y һу? 
the sums being over all values of ; and j from 1 to N (i 5 j). 

If we define а, аз +1 when the X-rank of the i" item exceeds that of the j™ 
item and —1 in the contrary case, then а? will always be 1 for i + j, and 
Y dj у2 will be N(N — 1), the total number of ordered pairs. The same holds for 
Y 6,2. But У (a;jb;j) is twice S, since each pair is counted twice, once in the 
order ij and once in the order ji. (Each term in the sum is +1 if a,j and b,j are 
both +1 or both —1, and is — 1 if they have different signs.) It follows that 


(11.15.1) кв 
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rg is equal to гк as defined by Eq. (11.14.2). If x; is the X-rank of item number 
i and у; its Y-rank, and if in Eq. (1) we let 
(11.15.2) aij = Xi — Xj bi; —-yi— Yj 
we arrive at the Spearman coefficient rs. To show this, we note first that x; and y; 
both run through the set of integers from 1 to N, so that 
(11.15.3) х=} н HEN + 1) 
Now 

€ (аЬ) EX L (x; — x Vi = Ур 

= Y ху + » Xjyj— у) (xy; + Хуу) 
ij i i 

d side are each equal to № У, ху. The third 


The first two terms on the right-han 
term is the same as —2(); xi(Y.; у), since i and j can be interchanged. There- 


fore, by Eq. (3), 


NW 0] 
(11.15.4) Y (ayb) =2N E хи 2 [re] 
Putting x, — y; we obtain 
: а , NAN +1) 
(11.15.5) Ya? =D by aN ar у 


Since Y, xj? is the sum of the squares of the integers from 1 to N, which is 


ММ + ПОМ + 1/6, the denominator of rg is 
NAN + DON +1) _ NXN + D? 


(1159 (Уау 6) = 3 2 
JU D 
EET 
Now, if we write d; = Xi — Jo 
(1115.7) 5 d? = У, (а? + у — 24000) 
i=1 
s2Yx-2Y 


_ М E DON D — 2 y xy, 


Substituting this expression in Eq. (4), we obtain 


NAN + DN +1) a WNA 1)? 
(11.15.8) Y aijbij и ma NES e—— 


2 = 
por wy 


Dividing by Eq. (6), we find that rc = "s: 


312 INTRODUCTION TO STATISTICAL INFERENCE 11.16 


Lastly, if the scores a;; and 5;; are based on measured values хь уь and if 


aij = X; — Xj, bij = y; — yj, the numerator of rg is 
(11.15.9) У (= хуу y) =2N Y ми 2 (x xJ(2. У) 
ij i i 
=2N(N — 1)syy 


where sxy is the sample covariance of X and У. In the same way, the denomi- 
nator of гс is equal to 2N(N — 1)5х5у, and therefore rg reduces to the ordinary 
Pearson coefficient. Thus the Spearman coefficient is simply the Pearson 
coefficient calculated as if the ranks were the actual variates. 


11.16 Tied Ranks In practice it is often difficult to distinguish between two 
or more individuals with regard to the attribute considered, and they are 
reckoned as "tied." In such cases the tied individuals are given a rank which is 


ranks but reduces somewhat the sum of squar 
which there are no ties. 

If in the X ranking there are г items tied and in the Y ranking u items tied, the 
denominator of Kendall's coefficient in Eq. (1 1.14.2) is replaced by 


(1160 — [3N(N = 1) — 414 — [мм — 1) = 3u(u — )]2 


and if there are several sets of ties the appropriate amount is subtracted for each 
such set. The numerator is calculated as before, all tied pairs contributing zero 
to the total score S. Thus, suppose the ranks are as given in the following table: 


€5, as compared with a ranking in 


EXAMPLE 9 
TABLE 11.5 
Xi 1 2$ 24 4 $ T 77 ee 8 
Yi 21 4 4 T7 9 10 4 6 7} 
ni 8 7 6 5$ 23 1 i21 0 


There are two sets of ties, with ү = 2 and / = 3, in the first ranking and two 


sets, with и = 2 and u = 3, in the Second ranking. The expression (1) is 
(45 —1— 3) "(45 — 1 — 39/2 = 41. pn the Shorter method of calculation 
of S, any number to the right of y; and equal to it counts 3 towards п; and any 
number (greater or smaller) counts 1 if its x rank is the same. Then P — 34 and 
S = 23, giving ry = 23/41 = 0.56. 

In Spearman's method, the formula as 
one set of u ties in Y is 


N-N-6 a а eu 
(LORS ер ба Иа - 0-408 ~ u) 
[N* -N - (P - 9] ГМ -N-q3— и) |1? 
and each set of ties gives a separate correction. 


Corrected for one set of t ties in Y and 
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In Example 9, У d? = 49, апі № — М = 990. Therefore, with г = 2 and 
3, and и = 2 and 3, 
3 990 — 294 — 3 — 12 — 3 — 12 
"s 990 — 6 — 24 
666 


=— = 0.69 
960 


The uncorrected value is 0.70. 


11.17 Significance of the Rank Correlation Coefficient The rank correlation 
coefficient can be used as a test of the association between X and Y, and has the 
advantage over the Pearson coefficient of not requiring the assumption that one 
or both variables are normally distributed. If we suppose that in the parent 
population Y and Y are independent of each other, then if the N items in a 
sample are placed in their natural order of ranking for X (i.e., 1, 2, 3... N), the 
order of ranking for Y is equally likely to be any one of the N! permutations of 
the numbers 1 to N. (For the present, we are assuming that there are no ties.) 
For small values of N it is possible to calculate for each of these permutations the 
and so form a probability distribution. Thus if N = 5, 
nd these give the following possible values 
pond to the same value of S in 


corresponding value of ry 
there are 120 possible rankings for Y, a 
of S and of гк (different rankings may corres 
(11.14.2)). 

TABLE 11.6 


B Dax og. 4 d l te + 0-3 =Шш 


гк то 0.8 0.6 04 02 0 


E 1 4 9 15 20 22 20 15 9 4 1 


for the distribution of S lies fairly close to a normal 
roved that, as N increases, this distribution tends to 
— ПОМ + 5)/18. For М > 10 the approximation 


A frequency polygon drawn 
Curve, and in fact it may be p 
normality, with variance N(N 
is quite good. 


From the above table it is evident that the probability, when N = 5, of a 


value of гк numerically as high as 0.8, when x and Y are independent, is 
10/120 = 0.083, so that even a value as high as this is not v cay at the 5% 
level. For a sample of 10, the standard deviation of S is (125) = 11.2, so that 
à significant value at the 5% level, obtained from the normal approximation 
would be 1.96 x 11.2 = 22.0. This corresponds to a value of ry — +0.49. The 
exact probability for 15| > 23 is 0.046, a little under 555. | 
Tn estimating the significance of an observed S we should make a correction 
for continuity similar to that made in replacing the binomial distribution bya 
normal distribution. Since the distribution of S is discrete, successive values 
differing by 2 units, the observed S should be replaced by S — 1 if S is positive 


or by S + 1 if it is negative. 
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Spearman's coefficient rs also tends to normality as N increases, but more 
slowly than rg. Whether there are ties or not, the variance of У d? is given by 


М-м? 4 
2) — и. 
(11.17.1) И(У 42) = ( 6 ) Foi 
and that of rs by 
1 
и =— 
(11.17.2) (rg) Wri 


When N > 20 the distribution may be taken as approximately normal. 

Down to somewhat lower values of N (say 10) the distribution of 
rs[(N — 2). — rs?)]/? is approximately that of Student's г with № — 2 degrees 
of freedom. This is the same as the distribution which was shown earlier to 


hold exactly (on certain assumptions) for Pearson's coefficient in a sample from 
an uncorrelated parent population. 


11.18 Contingency Tables Often in medical, biological, 
econometric research we encounter characteristics or attributes which we cannot 
measure accurately and according to which we may not even be able to rank the 
individuals of a sample, but which do permit us to divide the sample into 
classes and count the numbers in each class. We might, for example, classify 
a sample of women students by the color of their hair, as “fair-haired,” “red- 
haired,” “brown-haired” or “black-haired,” or a sample of housewives by their 
place of residence as “гига!” or "urban." 

А frequency table in which а sample is classified according to two different 
attributes (whether quantitative or not) is called a contingency table. It looks 
xcept that the columns and rows do not neces- 
o any numerical values of the attributes X and Y. If a sample 
of N is divided into s X-classes (denoted by x,, X2... Х,) and into г Y-classes 
(denoted by Y,, У,... У), the frequency f;, of individuals falling into class X; 


and also into class Y; is entered in the j^ row and Jj" column of the table. 
Thus, for s = 4 and f = 3; 


psychological or 


TABLE 11.7 
x 

Yi Ye Уз 
№ fu Л» Лз n 
" № fn Рз Рз r2 
Xs fai fz fas ra 
Na Ја Л» fas ra 
cı c2 єз N 


11.18 DISTRIBUTIONS OF PAIRS OF VARIATES 315 


The marginal total for the i" row is r; = } уу and that for the j'^ column is 
c; = Y, fi. The grand total is N = ут = Ус, 

. We may assume that in the population there is a probability ту that an 
individual selected at random will fall in classes X; and Y; The relative fre- 
quency f;;/N will be an approximation to Tij- If X and У are independent we 
shall have 


(11.18.1) my = пел 


where л; (= Уулу) is the probability of X; regardless of the Y classes, and 
п, (= У; п.) is the probability of Y; regardless of the Х classes. We can define 
the mean square contingency, which is a measure of the degree of association 
between Y and Y in the population, by 
( m 
11.18. м 

8.2) ф уу == 
This is 0 if and only if X and Y are independent. Its greatest possible value is 
q — 1 where q is the smaller of the numbers s and ¢ (or their common value if 


they are equal). The quantity Kq — 1) may therefore be used as a measure of 


the degree of association, and, like r?, it varies between Oandl. — 
The expected frequency in the /, у" cell of the contingency table is М» and 


the deviation of the table from expectation can be measured by calculating the 


quantity 
(f; — №? 
(1.183) a a _ 


the sum being extended over all the cells of the table. Since we usually wish to 
test the hypothesis that X and Y are independent, we can replace п by nj, but 
these marginal probabilities are unknown, and must be estimated from the 
sample. It is natural to estimate 7; and л, by the relative marginal frequencies 
"ИМ and c;/N respectively, and in fact these are the estimators given by the 


method of maximum likelihood. А 
The likelihood of the observed sample of N picked from the assumed 


Population is given hy 
(11.18.4) = [тд 
ij 


А "eii „= 1 Ул, = 1. Using the 
where the л; and л, are subject to the restrictions УЖ Ti , je 
method of ‘Lagrange multipliers (Appendix AS) and maximizing logL — 


ÀY n, — рУ. п we obtain the relations 


(11.18.5) дез а _ er 


It follows, since X вм = p» с = N, that r; = №; and с; = М. 
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Using these estimators, we can take the expected frequency in the i, " cell as 
(11.18.6) dy NS. 
Therefore, 

Qi = $i) 
(11.18.7) х2 = 


= +, 
Uu +. z 

=f = -N 
7 Qij 


[54-1 


ij l'iCj 


ij in the contingency table 
) of the cells, and when these are 


Since the у? distribution applies strictly only in the limiting case, as the 
expected frequencies increase indefinitely, the approximation should not be 
used if some of these frequencies are very small. Some investigations (see [10]) 
Suggest that a minimum value of фи аз low as 1 may be tolerated if values below 
5 do not occur in more than about 20%, of the cells. 

If a larger Proportion of the cells have expected frequencies below 5, it is 
wise to use an exact method [11]. 

EXAMPLE 10 Table 11.8 Bives some results Obtained by Woo (Biometrika, 
1928) on the association between "left-handedness" and “left-eyedness.”” 


mbidextrous and right-handed, the Y- 
categories left-eyed, ambiocular, and right-eyed, 


ponds to the mean Square contingency ф? 
defined in Eq. (11.18.2) is 


5 2 
(11.18.8) Poles) Зы 
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This, divided by q — 1 (q being the lesser of s and t), is a measure of the 
degree of association indicated by the sample. The upper limit 1 is attained if 
(for s < t) each column contains just one non-zero frequency, or (for s > t) if 
each row contains just one non-zero frequency. The quantity C = [/?/(q — HI? 
is a coefficient of contingency. 


TABLE 11.8 
Y: Ys Ys 


Here y? = 413[(34)?/(124 x 118) + (62)2/(124 x 195) +... 
+ (52)2/(214 х 100) — 1] = 4.02. 


In Example 10 above, 4 = 3 and f?/(q — 1) = 4.02/826 = 0.0049, so that 
С = 0.07. There is very little association between X and Y. 


11.19 The Contingency Table with Two Rows or Two Columns For a 
2 x n table the calculation of y, may be somewhat simplified. Аз shown by 
Brandt and Snedecor, the value for a sample, with frequencies as given in the 
following table, is: 


2 9 2 
s а; ry 
(11.19.1 "i че") 
) Xs fira L cj N 
where с; = а, + b;. 
TABLE 11.9 
Yı Үз Үз € У» 
x a аз аз € an ry 
Xs bi be bs bn re 
сї сз єз ГҮ, Cn N 


Either row in the table can, of course be chosen as the aj. 


EXAMPLE 11 (Lindstrom) The variable Х is the presence (or absence) ofa 


sugar-producing gene in ears of corn, Y is the number of rows of kernels in the 


ear. 
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TABLE 11.10 
No. of Rows of Kernels 
8 10 12 ' 14 
Present 18 37 27 0 82. 
Absent 15 26 43 4 88 
33 63 70 4 170 


Since the numbers in the last column (Y = 14) are so small, it is better to group 
the last two columns together. We then have a 2 x 3 table, and Eq. (1) gives 


2. (i70y E x 372 i 272 | 
*  (82(88 [33 ' 63 1 74 170 
= 7.39 


With 2 d.f., the probability of a value as large as this is about 0.025, so thas 
association between presence of the sugar gene and few rows of kernels it 
definitely indicated, although not strongly so. 

The value of the contingency coefficient C is 0.21, which is fairly high for 
this coefficient. 


With a2 х 2table the calculation of y,? is still simpler. If the frequencies in 


the four cells are denoted by a, b, c, d, the value of x. is given by 


xs? ary + ejr, b?|r, + d?r, 
XX” «@ oS HI 
с, C5 


On substituting a + b for Tis 


| etc., and carrying out some algebraic manipulations, 
this becomes 


(11.19.2) 2 _ N(ad — bo? 


TiT36,0 


Xs 
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TABLE 11.11 
Sick Not Sick 


With drug 10 15 25 


Without drug 19 6 25 


| 29 21 50 


2i 90460 = 285)? — 
te = 75-25592 777 
The probability of a value as high as this with 1 d.f. is a little less than 0.01, so 
that an association between the use of the drug and immunity from sickness is 
pretty definitely indicated. The coefficient of contingency is 0.37. 
Note that with one degree of freedom, the distribution of y, (the square root 
of 7,2) is normal. The probability can therefore be found from a table of the 


normal law, using both tails. 


11.20 The Yates Correction for Continuity The cell-frequencies in a con- 
tingency table are necessarily integers, so that y,? is a discrete variate, whereas x 
varies continuously. The situation is something like that in approximating a 
binomial distribution by a continuous normal distribution, where the sum of 
terms from x = a to x = b (inclusive) is approximated by an integral from 
a — Y to + 4 (see $3.11). 

In the 2 x 2 table, as pointed out by Yates, the approximation to 7? is much 
improved by replacing d by d — 3 or d + 4, according as ad > bc or ad < bc, 
and adjusting the other frequencies so as to keep the marginal totals constant. 
The effect of this is to replace (ad — bc)? in Eq. (11.19.2) by (lad — be| — №2), 
and thereby to reduce somewhat the apparent significance of the result. 

In Example 12, above, the rearranged table would be as shown. Since 
[(103)(64) — (145(18)]? = (200)? = (225 — 25)?, the value of x°, with the 
Correction, is reduced to 5.25, the probability for which is more than 0.02. 


М < 20, or if 20 < N « 40 and the smallest expected 


Tf the total frequenc: 
: А Fisher's exact method, given in the 


frequency is less than 5, it is better to use 
next section. 
2 Tables Fisher has pointed out that 


* 11.21 Fisher's Exact Method for 2 x 
Il the possible 2 x 2 tables which have 


exact probabilities can be calculated for а 
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the same set of marginal frequencies. Let the observed cell-frequencies be a, b, 
c, d, and suppose the table so arranged that d is the smallest of these. The 
distribution of the N items in the sample among the four cells (on the hypothesis 
that there is no association between Х and Y)is a hypergeometric one (see 
83.5). It has the following mathematical model: Given № balls in an urn, of 
which r, are black (corresponding to Ху) and r, are white (corresponding to X;), 
and given N boxes of which c, are red (for У,) and c; are green (for У,), with- 
draw the balls one at a time and place them at random in 
box. The number of black balls in red boxes will be а, 
frequencies. 

The probability that there are just b black balls a 


the boxes, one ball to a 
and similarly for the other 


nd d white ones in the c; 


тү. 
green boxes = lee ) since the numerator is the number of ways of 
2 


choosing b black balls out of r, and d white 
nator is the total number of ways of picking c; 
green boxes have been filled, the numbers a a 
since а =r, — band c = rz — d. Th 
frequencies a, b, c, d, is, therefore, 


balls out of r5, while the denomi- 
balls out of the urn. But once the 
nd c for the red boxes are fixed, 
€ probability of the whole observed set of 


(5) 

(1.21.1) ра) = ХАЯ) _ rit rateyteg!t 
N a!bic!d!N! 
() 


di 
(11.21.2) Р= У ра) 
d-0 

where d, is the observed q. (The value of 4 determines the whole table, the 
marginal frequencies being fixed.) If d, > 6, the sum will go from d = d, 
up to d = c, ed with values of q larger than the 
ponds to one tail of the distribution, 
thtails. It is to be expected, therefore, 


will be an approximation to 2P and not 
to P. 


In Example 12 above, d = 6 and ô = 10.5, so that 


6 
Pty a) 
d=0 
The values of p(d) are given in the following table: 
d 0 1 2 3 


4,5,6 


p(d)| 0.00860 0.00161 0.00020 0.00002 0.00000 
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and we find that P = 0.0104. The z? value, with Yates's correction, gives 
P = 0.022, which is quite close to 2P. 

The chief objection to Fishers method is the considerable amount of 
computation usually involved. Tables recently published by Mainland and 
others [12] enable the significance (at 5% and 1% levels) to be estimated very 
quickly, without calculation. 

An alternative procedure is to use a normal approximation with a con- 
tinuity correction. The variance of d, as given by Eq. (3.5.7), is 


N-—c Г. r2 Сас" Г 
(11.21.3 / a 1i j^ 1027172 
: ) V(d) Nol C3 N 1 N NAN — 1) 
and if 
i—6|-4 
(11214 "mie 
) [а] 


is treated as a normal variate, the probability of a value at least as great as this 
can be found. 

Thus, in Example 12, |а = ô| —4=4, Иа) = 3.11, so that z = 4/1.76 
= 2.27, giving P = .0116. This is fairly close to the exact value. 


11.22 The Chi-Square Test as a Test of Homogeneity It sometimes happens 
that a table which looks like a contingency table really reflects a different 
Situation. The rows of the table represent each a different set of observations, r; 
in number, the individuals in each set being classified according to the attribute 
Y. The numbers r; are selected arbitrarily and do not depend at all on the 
Population. The hypothesis to be tested is that each sample (represented by a 
TOW of the table) comes from the same population in which the probability of 
attribute Y, is л, (with У лу = 1). 

_ The value of лу is estimated as before by c;j/N. И may be shown that the 
limiting distribution of 7,2, calculated in the ordinary way, is still the x? distri- 
bution with (s — 1)(¢ — 1) degrees of freedom. 


EXAMPLE 13 In order to see if the age-distribution of whitefish in Lake 
Wabamun, Alberta, had changed significantly between 1957 and 1958, samples 
Of the catches in these two years were classified in age-groups as follows: 


TABLE 11.12 
Age (Years) 


Year |34 5 6 7 8 29 


1957| 6 15 10 38 62 26 | 157 


1958 | 16 12 9 22 36 5 | 100 


22 27 19 60 98 31 || 257 
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Here 
4 (257)? 16? m 122 ds nf 52 = т 
* = (100) 57) 5 2] "V" sp 987 


= 18.6 


The number of degrees of freedom is 5, so that this value of у? is highly sig- 
nificant. The value of f? is 0.072, giving C = 0.27. 


PROBLEMS 
А (88 11.1-11.2) 
1. If the joint probability density for X and Y is f(x, У) = 2/а?, 0 € x <y, 
0 < y < a, find (a) the marginal probability densities g(x) and A(y) (b) the regression 
equations of Y on X and of X on Y (c) the means and variances of .Y and Y, the 
covariance, and the coefficient of correlation between Х and Y. Hint: f(x, y) is 
constant over a triangular area in the xy plane. 


2. Is it true that a necessary and sufficient condition for two variates X and Y to 
have a bivariate normal distribution is t 


hat the two regression equations are linear? 
Hint: See Problem 1 above. 


3. For the bivariate normal distribution, Eq. (1 1.2.5), show th 


н J at the variance of 7: 
is equal to p? and that the correlation co 


efficient between 7: and v is equal to p. 
4. Show that the coefficient of correlation for two variates Y and Y is the geometric 
mean of the slopes of the two regression lines, one reckoned from the X-axis and one 
from the Y-axis. (The geometric mean of a and b is 4/ab.) 

5. Show that if X and Y are independent variates the 
(The condition for independence i 


6. If X is uniformly distributed on ( 


р ox toy” 

8. Let the variate X have the marginal distribution g(x) =1 
the conditional density of Y, given X = x, be f(y|x) = 
0, f|) 21, -x «y» «1 7X, 0 « x < 4, and f(y| 
X and Y are uncorrelated. 


9. If X, X», Xs are uncorrelated variates, each with the Same standard deviation с. 
find the coefficient of correlation between Xi + Xə and Хз + Хз. ' 

10. If X and Y are uncorrelated, with means zero and variances 5,2. 9,2 show 
that the variates U = X cos + Ysinzand V — Хаа Усова hase RS AN 
tion coefficient 


‚ —$ < x < {, and let 
x<y<x+1,-bex< 
X) = 0 otherwise, Prove that 


a 9x? — сү? 
ы 
(ox? — вуз)? + 4a x? у? созес?2=]1/2 


В ($$ 11.3-11.6) 


1. The following data represent the ages of husband (X)and wi fe 
selected at random from a certain population, ife (Y) for 20 couples 


X |22 24 26 26 27 27 28 28 29 30 30 30 31 32 33 34 35 35 36 37 


Y |18 20 20 24 22 24 27 24 21 25 29 32 27 27 30 27 30 31 30 32 


Find the equations of the two regression lines. Make a Scatter diagram for the data 
and draw the two lines on it. 


-3 


| 
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2. Calculate the coefficient of correlation of X and Y from the data of Problem В.1. 
On the assumption that the population is bivariate normal, find 95% confidence limits 
for the two regression coefficients, B (for the first regression line) and B' (for the second). 

3. In studying a set of pairs of values of related variates X and Y, a statistician has 
computed the following quantities: № = 100, У х = 12,500, У y = 8,000, У x? = 
1,585,000, У у? = 648,100, У ху = 1,007,425. Calculate X, у, sx, sy, sxy and r for 
these data. 

4. In the following table, X is the weight (to nearest half pound) and Y the height 
(to nearest tenth of an inch) for 200 freshmen at a university. 


X| 90 100 110 120 130 140 150 160 170 180 190 200 


Y 5 4095 — = = =o = = = = = S 
76-77.9 1 

74-15.9 1 1 1 1 

72- I 1 4 1 

70- i 2 G6 T 6 Z ILI 9€ X 1 
68- 2 8 @ 8 9 1 1 1 
66- 8 16 14 13 6 2 1 1 
64- 3 s 7T Ff 9 а 1 1 

62- 1 4 l| T7 a 

60- 

58-59.9 1 


Find the regression equation of height on weight, and give 95% confidence limits for 
the regression coefficient В. Calculate the correlation coefficient between height and 
weight. | | 

5. А coefficient of correlation calculated from а sample of size 25 is found to be 
0.37. Is this value significantly different (at the 5% level) from zero? | 

6. A sample correlation coefficient of 0.561 is said to be highly significant. Assum- 
ing that this means that the probability of getting a value numerically as great is less 
than 0.01, what is the smallest sample size that would warrant the statement? Hint: 
(М — 2)/(1 — r2)]!/? is to be the same as fo: for N — 2 d.f. Find N — 2 by trial, 
using the table of r. аср Р 7 Г 

7. Is it true that a correlation coefficient of r = 0.6 indicates а relationship twice 
as close as that indicated by r — 0.3? Hint: Consider the relative accuracy of estimation 
of Y from a given X in the two cases, as measured by thereciprocal of thestandard error 
of estimate. 

8. The marks of a class of 12 students on a mid-term test (x) and on the final 
examination (y) were: 


x 41 45 50 68 47 77 90 100 80 100 40 43 


у 60 63 60 48 85 56 53 91 74 98 65 43 
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What is the regression estimate of the fina! mark of a student who obtained 60 on the 
test but was ill at the time of the final examination? What is the standard error of this 
estimate? 

9. The two regression lines for variates X and Y have been computed as 4x — 5y + 
33 — 0 and 20x — 9y — 107 — 0. Given that the variance of X is 9, calculate the 
variance of Y, the means for X and Y and the coefficient of correlation between Х 
and Y. 


С (8811.7-11.11) 


1. The following table gives death-rates per 100,000 in the United States from 
typhoid fever for the years from 1900-1920: 


Year Rate Year Rate Year Rate 
1900 31.3 1907 20.5 1914 10.8 
1901 27.5 1908 19.6 1915 9:2 
1902 26.3 1909 17.2 1916 8.8 
1903 24.6 1910 18.0 1917 8.1 
1904 23.9 1911 15.3 1918 7.0 
1905 22.4 1912 13.2 1919 4.8 
1906 22.0 1913 12.6 1920 5.0 


Find the best-fitting straight line for these data. 
estimate the date at which typhoid fever would h 
States. Hint: Take the origin of X (the date) at 19 
—9, etc. 

2. In the following table, x is the amount of irrigation w: 
an experimental farm in India and y is the yield of rice in ton: 


If the linear trend had continued, 
ave been wiped out in the United 
10, so that the values of x are -10, 


ater (inches) applied to 
s/acre [13]. 


x 12 18 24 30 36 42 48 


У 1 5.27 5.68 625 721 802 871 842 


m 1 2 3 4 5 6 7 8 


y 99 132 164 197 225 261 292 325 


4. Find the best-fitting line for the data of Probl 
using (a) two groups of 5, (b) three groups of 3, 4, 
fidence intervals for P in both cases. 

In method (b), obtain an estimator for ou, 


D (§§ 11.12-11.13) 


1. [14] Over a period of 20 years the mean wheat yield of eastern England was 
found to be correlated with the autumn rainfall, with r = —0.629. Is this significantly 


different from zero at the 1% level? Is it significantly different from —0.3? 


35.7 38.8 


em C.3 by the method of grouping, 
and 3 respectively. Find 90% con- 


assuming that ov? = 16 Ou?, 
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2. In a sample of 25 pairs of individuals (parent and child), the correlation in a 
certain character was found to be 0.60. Obtain 907; confidence limits for the popula- 
tion correlation coefficient. Could one conclude, with 90% confidence, that the true 
value (in the population sampled) was at least 0.40? 

3. One random sample of 28 from a certain bivariate population gave r = 0.60; 
another independent random sample of 23 gave r = 0.40. Is the difference significant 
at the 5% level? Hint: Use a two-tailed test for the difference, after making the 
Fisher transformation. 

4. For a sample of size 30 from a bivariate normal population, r was found to be 
0.684. -An independent sample of size 40 gave r = 0.719. What estimate would you 
suggest for the true value of p? 

5. Obtain an estimator of p from the sampling distribution of r (Eqs. 11.12.7 and 
11.12.8) by finding that value of p for which f(r, p) is a maximum, with a given r. 
Is this estimator unbiased? Hint: Put d(log f)/dp = 0 and solve the quadratic equation 
for p as far as terms of order 1/N. Neglect all terms in S(pr) except the first. 


E ($$ 11.14-11.17) 

1. (Garrett) Twelve salesmen were ranked in order of merit for efficiency (X) by 
their manager. The ranking ( Y) in accordance with length of service is also given in 
the following table: What correlation is there between length of service and efficiency ? 


Salesmen A B € D Б Е Б H J K È M 
X 6 12 1 9 8 5 2 10 3 7 4 11 
Ү i INS 2 4 6 9 Е 15 5 75 3 10 


Calculate both Spearman’s and Kendall's coefficient, correcting for the ties in the У 
ranking. 

2. The scores of 10 students on two tests are given in the following table. Calculate 
the Pearson coefficient of correlation for the actual scores, and the Spearman coefficient 
for the ranks. 


Student A B C D E E G H J K 
X 92 89 87 86 83 77 71 62 53 40 
Y 88 85 93 79 70 87 52 84 41 64 


3. If a sample of seven pairs is drawn from a population of values of independent 
variates XY and Y, it is known that the computed Spearman coefficient will exceed 
0.714 in not more than 5% of cases and will exceed 0.893 in not more than 1% of cases. 
What conclusion may be drawn regarding the judges in Example 8 of $ 11.14? Apply 
the Student-r approximation of $ 11.17 to the same problem. 

4. In a drama competition, ten plays were ranked independently by two adjudica- 
tors, as follows: 


Play | A B C D E Е G H J K 
Rank (X) | 5 2 6 8 1 7 4 9 53 и 
Rank (Y) |1 7 6 10 4 5 3 8 g 9 


Calculate the coefficient of rank correlation by both the Spearman and Kendall form- 
ulas. Would you say that there is a significant measure of agreement between the two 
adjudicators? Hint: Use the normal approximation to the variance of S, and make the 
Correction for continuity. 


326 INTRODUCTION TO STATISTICAL INFERENCE 


F (88 11.18-11.22) : » 

1. In the accompanying contingency table, X represents a rating given to each of a 
group of university freshmen on the basis of high school reports and Y represents the 
final standing in degree examinations for the same group. Discuss the association 
between these two attributes, and calculate the coefficient of contingency C defined 
in $ 11.18. 


X 
Y Fair Good Excellent 
3rd class 73 67 10 
2nd class 64 84 15 
Ist class 5 24 28 


Ыым—Шщ 


2. In a public opinion survey the following questions were asked: (1) Do you drink 
beer? (2) Are you in favour of local option on the sale of liquor? In one district the 
results (excluding those who had no opinions) were as indicated. 


ForLocal Against 


Option 
Drinkers 18 39 
Non-drinkers 45 37 


Does this provide good evidence of a 
on the subject of local option? 

3. Two batches of 12 experimental animals, one batch inoculated and the other 
not inoculated, were exposed to infection under comparable conditions. Of the inoc- 


ulated group, 2 died and 10 survived; of the other group 8 died and 4 survived. 
Does this observation provide evidence (at the 5% 


n association between drinking habits and opinion 


junior high, 
senior high or college. The results were: 
| Junior High Senior High College 
Male | 13 25 12 
Female | 23 20 7 


Is there a significant association betw 
Brandt-Snedecor formula. 

5. Two groups of freshmen a 
aptitude test. The groups (А an 
they had experienced. The frequ 


een sex and educational level? Hint: Use the 


pplying to enter a university took the same college 
d B) differed in the type of high school education 
ency distributions of scores for the two groups were 


as follows: 

Score 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 
Group А 7 68 66 47 51 39 43 39 33 18 
Group B 22 8 14 12 3 13 3 14 12 10 
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Calculate the value of x? and determine whether there is a significant difference in 
college aptitude between the groups. 
6. Prove that if ad — bc, then x? for the table 
a+4 b 


c—à d+ 


te 


is given by N(|ad — Бс| — N/2)2/(rirecice) where ri = а + В, га = с а, с =a 4 ©, 
со — b + d. 

7. It has been suggested by V. M. Dandekar (see [13], p. 388) that a better approx- 
imation than that given by Yates, to the true probability P for a 2 x2 table, may be 
obtained by subtracting from the uncorrected value хо? the term (x-1? — хо)? — x)2?/ 
(x-1? — x?) where ха? and у-1? are the values obtained by respectively increasing and 
decreasing the smallest frequency in the table by unity. Test this suggestion on the 
data of Problem F-3. 

8. Prove the Brandt-Snedecor formula for y,?, Eq. (11.19.1). 
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Chapter 12 5 


REGRESSION ANALYSIS AND 
CURVE FITTING 


12.1 The Equations of Multiple Regression In the last chapter we considered 
the relations between two variates X and У. We now suppose that Y depends 
on p other variates which will be denoted by X,, X;... X,. These need not be 
independent, and in fact may all be powers of a single variate X. We shall call 
Ху... X, the predictors and Y the predicted (or dependent) variate. The usual 


problem is to find the best linear predicting equation for Y (in the least squares 
sense) of the form: 


(12.1.1) уг= У bx.  120,12...p 

To avoid introducing a separate constant term, the first variate Xy is a 
dummy which always takes the value 1. The coefficient bo is then the constant, 
denoted in Chapter 11 by a, and Б, is the previous b. The results of this chapter 
reduce to those of Chapter 11 when р = 1. 

The coefficients b; are called partial regression coefficients. They are esti- 
mators of the true regression coefficients f; which are supposed to characterize 
the population, and they are calculated from a set of observations of each of 
the p + | variates made on № individuals from the population. We shall denote 
the observed value of X; for the individual numbered х DY n. Sel, 2... 


x = 1, 2... №). The set of all N(p + 1) observations may be written и 
matrix— 

Хурх... Хм 

2122... Хәм 

XpiXp2 +++ Хрм 

Ji Уз +=- Ум 


with p + 1 rows and N columns and much of the material in this chapter is 
most conveniently expressed in the notation of matrix algebra. For those who 


are unfamiliar with this subject, a brief discussion of the principal ideas will 
be found in the Appendix, $8 A.18-23. 
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The true regression equation in the population is supposed to be 


(12.1.2) п= У Вх. 10. 19125 


The х are fixed numbers, or at least the errors in x; are small compared with 
the error in y. The b; will therefore be chosen to minimize the sum of squares 
of the differences between the observed у, and the theoretical п„. We shall use 
the symbol у. to denote summation over variates with respect to i (sometimes 


j or k) and S to denote summation over individuals with respect to ж (sometimes 


В or y). The least-squares condition becomes 
(12.1.3) Su Ў, fx) = minimum 
On differentiating with respect to the fj; and equating the derivatives to zero, 
we have for the estimators В; the equations 
S xah 3 F хь) =® i,j =0,1,2...p 
7 
or, equivalently, 


р ^ 
(12.1.4) 5 Y xaxuli; = Sx 1=0,1,2...р 
i=0 


This is a system of p + 1 linear equations in the p + 1 unknowns ĝ;. Written 
out in full, with f; = bj, they are 
bodoo + b101 + +++ + 6,а0р = 90 
Боало + аа +... + Баз» = di 


(12.1.5) Боазо + 01021 +... + byd2, = 92 


boa so + Бар e + Барр = dp 
Where 
(12.1.6) ар = $ хаха n= $ ху, 
This system is called the set of normal equations of the regression problem. 
It is clear from the definition of a;; in (6) that a;; = aji The set of coefficients 


а; in the normal equations therefore forms a symmetric matrix. If we denote 
this square symmetric matrix by A and let b and g denote the one-column 


matrices (usually called column vectors) 
bo go 


Ip 


the equations of (5) may be written in the compact matrix form 


(12.1.7) Ab=9 
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The matrix solution of these equations is 
(12.1.8) b=A7'g 


where A^! is the inverse of A, that is, the matrix which when multiplied by А 
becomes the unit matrix. One method of inverting a matrix is given in Appendix 
A.23. There are other, and perhaps speedier, methods, but this one is straight- 
forward and systematic. 


H 


12.2 The Regression Equations and Maximum Likelihood Шу, = п, + &, 
and if the £, may be assumed to be independently and normally distributed about 
zero with a common variance c?, the joint probability density for the set of c's 
is L = o^ (2n) №? exp[ —5(e,?)/2c?]. Therefore, 


5(е,2) 

20? 
The condition for maximum L is clearly the same as for minimum S(e,?), which 
is equivalent to Eq. (12.1.3). The method of maximum likelihood leads there- 
fore to the same normal equations as the method of least squares. 

We may also consider the b’s as linear functions of the y,, chosen so as to 
be the best unbiased estimators of the fs. If by “best” we mean having minimum 
variance (and therefore maximum precision), it may be shown [1] that by using 
this criterion we again arrive at the same set of normal equations. 


N 
(12.2.1) log L 2 —N loge — F log(27) — 


12.3 The Solution of the Normal Equations The normal equations are 
(12.3.1) Ха, = gi i=0,1,2...p 
j 


Besides the matrix solution of Eq. (12.1.8), there is an elegant theoretical 
solution provided by Cramer's rule, namely, 
_ АКА) 
' d(A) 
where d(A) is the determinant of the matrix A (assumed to be non-singular) 
and d;(A) is the determinant of the matrix derived from A by replacing its /'® 
column by the column of g's. However, for p > 3, it is generally safer to use 
a process of systematic elimination of the unknowns, one at a time. One such 
process is the Square Root Method, often called Choleski's method ([21, [3]). 
Details of the process, with an illustrative example, are given in Appendix A.24. 
There are several other methods available, but this is one of the more compact 
schemes. 


(12.3.2) 


12.4 The Variances and Covariances of the Regression Coefficients From 
the assumptions mentioned in $ 12.2, it follows that the expectation of b; is 
equal to В; and that the covariance of b; and b; is o? a", where a! is an element 
of the inverse matrix A~?. 
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Since g; = бхыу, and b; = } ав, we have 


(12.4.1) b; 2 Y a? S ху, 
, j 
Now the expectation of y, is nz = У, Вуху, so that 
(12.4.2) E(b) = Y a $ xj, b ГЕЗА 
1 
= Y By У aay, 
= уз В.б 
where ди, = | when i = k and 0 when i z А. Therefore, 
(12.4.3) Е(Ь) = В, 


Since e, and єр are supposed to be independent, the covariance of y, and уз 
is given by 


(12.4.4) C( Vas Ур) = дра? 
The x’s being fixed, 
(12.4.5) С(9,9) = $ 5 xix; CO Ур) 


= 0? $ XiX ja 

= 724; 
Since the only non-zero term in the sum over В arises when В = а. Then 
(12.4.6) C(b;, b;) = У а“ Y a?'C(g,, 9) 

= у аќаа?ац 


k,l 
— в? Y a^ y, = с?аї 
к 


The elements of the matrix A^ !, multiplied by о?, give therefore the variances 
and covariances of the regression coefficients. The diagonal terms in particular 
give the variances (and hence the estimated standard errors) of the regression 
Coefficients, 


The variance of the predicted value y,, given by Eq. (12.1.1) for some new 
Set of values x,, x)... x, of the predictors, is obtained from Eq. (6). In fact, 


(12.4.7) И.) = e| 25 (b; — Box 2 = 2; x;x Е; — Bi(b; — Bj)] 
= У хухуС(Ь,, bj) 
ij 
20? Y alxx; 
ij 


The variance of the observed y which would correspond to the new observed 
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set x, ...x, is found by adding о?, which is the variance about the regression 
plane (or hyperplane). It becomes 


(12.4.8) (у) = г + У axx] 
ij 


This is a generalization to p + | dimensions of the two-dimensional relation 
of Eq. (11.5.16). For if х = 1 and x, = x, the 2 by 2 matrix A is 


N Nx 
A= 
NX Sx,? 


and its determinant is d(A) = №№ — 1)s,?. The inverse matrix is 


m 1 Be luat -X 
a ираг UU | 


-x 1 
Therefore, 
уй axx; = a9? + 2а9!х + ах? 
Dy] 
N-1 
NOS EX — 2xk bx? 
Е (N — 1)s,? 
1 x—x)? 
Ll, беа? 
N  (N-Ds, 
as in $ 11.5. 


12.5 Residuals The difference between the observed value у. and the com- 
puted value у,, for a given individual in the sample is called a residual, and 
is usually denoted by v,. 


(12.5.1) Ye = Ya = Ye m у, — У bii 
This is not the same as the true error à, which may be defined by 
(12.5.2) б = Ya =Ma = у, — У Biia 
If v is the column vector of the г, (x = 1,2... №) and X is the (p + D-by-N 


matrix (Xia), then 
Xv = X(y — X'b) = g — Ab 


where X' is the transpose of X (with rows and columns interchanged). This 
follows from the definitions of a;; and g; in (12.1.6), which in matrix notation 
become YX’ = А and Ху = g. But by (12.1.7), g = Ab, and therefore 


(12.5.3) Xv=0 
This is equivalent to the set of equations 


(12.5.4) Sx_0,=0, #=0,1,2...р 
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The residuals are therefore said to be orthogonal to each of the predictors 
№1, X3 lll Xp. 
The sum of squares of the residuals, which is the minimum sum of squares 
in (12.1.3), may be written 
Sv? =т= (у АБ) 
smy-bX)rseyrv 
Since Xr — 0. Therefore, 


(12.5.5) Se? = y (y — К) = y'y — g'h 


2 
since g^ = у" X’. In scalar notation this is 


(12.5.6) 50,2 = Sy? – УЬ: 


а і 


This equation gives another method of computing the sum of squares of residuals. 
It should be noted that since the two terms on the right-hand side are often 
nearly equal in magnitude, the values of ^; used should be correct to several 
more significant figures than are required in the final sum of squares. 


12.6 Distribution of the Sum of Squares of Residuals Since у, = И, + Og, 
and since the ð, are supposed to be distributed with expectation zero and 
variance с?, we have 

Е(у„?) = E(n,? + 21,6, + 0,7) 


= n, +в? 
Now п, = Ў: BiXias 
50 that 
qj = у Pil ых js 
Therefore, B 
(12.6.1) E(S Ya’) - L Bip jai; + No? 


Also, У, big; = У. Раа, so that 
(12.6.2) E(Y big; = У аЬ) 
i ng 
By Eqs. (12.4.3) and (12.4.6), ЕБР) = BiB; + 07a"! and therefore, 
(12.6,3) Е(У bui) = X (ВВ, + оаа, 
| =} Ahay че о + 1) 


since Y, ; аа; = > 5;; = p + 1, бу being | for each of its р + 1 values. 
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Substituting Eqs. (1) and (3) in Eq. (12.5.6), we obtain 


(12.6.4) E($ va?) =07(N = p—1) 
which means that an unbiased estimator of c? is furnished by 
1 
a2 _ 2 
(12.6.5) disc mpm oo 1($ v2 ) 

It may be proved (e.g., [4]) that if the д, are assumed also to be normal, 
then (№ — p — 1)ó?/c? has the у? distribution with N — p — 1 degrees of 
freedom. Moreover, on this assumption the b; are normally distributed and are 
independent of 67. This means that the Student- distribution can be used to 
fix confidence intervals for the 5;. In fact, 

BUE, 

= (а"82)12 
with № — p — 1 degrees of freedom, so that if г, corresponds to а confidence 
coefficient of 100(1 — 2) %, 


(12.6.6) 


(12.6.7) b; — 6(a)!?1, < В, < b; + ó(a^)!?t, 
The variance of the difference of two coefficients b; and b; is given by 
(12.6.8) V(b; — bj) = У(Ь) + V(bj) — 2C(b;b;) 


= c? (a + аі) — 249) 
and this may be used to test whether two coefficients differ significantly. 
EXAMPLE]! The following artificial data are supposed to represent the 
yield y of a chemical reaction under different conditions of (a) time of reaction, 
(b) temperature, (c) amount of an added ingredient. Each variate x; (i = 1, 2, 3) 


takes only two values, which we may code as —1 and 1. The variate Xo is a 
dummy which always has the value 1. The matrix X therefore has the form 


1 1 1 1 1 1 1 1 
—1 1 —1 1 —1 1 -1 1 
X= 
-1 -1 1 1 -1 -1 1 1 
-1 -1 -1 -1 1 1 1 1 


and у’ (the row vector of observations) is у’ = [61, 83, 51, 70, 66, 92, 56, 83]. 
Then 


o o o o 
о о о o 
о œ о о 
ооо о 
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and 
562 
94 
—42 
32 


Therefore. 


a 

] 

Ш 
—+# 
© œ 
О O œ o 
=- о © 
> > © OQ 


and | 


The fitted equation is 
y = 10.25 + 11.75x, — 5.25х, + 4.00х, 


The residuals are given by 
v' — [1.25, —0.25, 1.75, —2.75, —1.75, 0.75, —1.25, 2.75] 
and Sv,? = 400/16 = 25. 


‚ This sum of squares of residuals represents both experimental error and the 
Inadequacy of the linear model. Unless the experimental error can be inde- 
Pendently estimated (for example, by replicated observations) there is no good 
Way of telling whether the linear model is satisfactory. 

The estimator of c? in this example is 

25 
22—— = 6,25 
ы 8—4 

For four degrees of freedom, t, corresponding to a confidence coefficient of 
95% is 2.78, and for each value of i, а" = 1. Therefore the confidence interval 
for each b, is b; + 2.78 (0.78)? = b, + 2.45. 


12.7 Fitting a Polynomial of Second or Higher Degree Since the predictors 
Mes. X, of 8 12.1 are not assumed to be independent, they may be taken as 
Powers of a single variate X, say X, Х?,... ХР, and the method of least squares 
may then be used to fit a polynomial of degree p to a set of N observations 
Of pairs of values (Xas Ya). The values of x, may be chosen arbitrarily and the 
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computations will be simplified if they can be taken as equally spaced along the 
x-axis. Instead of Eq. (12.1.6) we now have 


ар = S$ x,'*4, ijc0,L...5 


12.7.1) | 
| Ji = 8 Xa Ya 


Thus if we wish to fit the quadratic 


(12.7.2) ye = bo + bix + b,x? 
to a set of N pairs (x,, yz), the equation giving the b; is 
Ab=g 
or, written out, 
N SX, Sx bo Sy, 
(12.7.3) Sx, 1507 Sx b; | = | Sx,y, 
Sx," Sx, S^ b; Sx; y, 


If we choose the unit of x so that the values (assumed equally spaced) 
increase by | from one observation to the next, and if we take the origin of x 
midway between the first and last Observations, we shall have Sx, = Sx? = 0, 
and this will considerably shorten the calculations. The equations of (3) then 
become: 

Nbo + Sx? b, = Sy, 
(12.7.4) Sx ?^- b, = Sx,y, 


Sx! bo + Sx,* b, = Sx,?y, 


EXAMPLE 2 Suppose corresponding values of x, and у. are as given in the 
following table: 


Hy 5 15 25 35 45 55 65 75 85 95 
Js 10.0 8.1 9.3 12.1 13.6 17.5 20.0 240 300 425 
и —4.5 —3.5 —2.5 —1.5 -05 0.5 15. 25 35 45 


И we replace x Бу и = (x — 50)/10, the conditions Su, = Su? = 0 will be 
satisfied. Also Su,” = 34° = 825, Su,* = 12338 — 1208.6, Sy = 187.1, 
Su,y, = 273.45, and 5и,2у, = 1817.98. The equations for bo, by, b, (in 


Ye = bo + Буи + bzu’) are, therefore, 


10b, + 82.5b, = 187.1 
82.5b, — 273.45 
82.560 + 1208.66, = 1817.98 
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from which bọ = 14.4225, b, — 3.3145, and 5, — 0.5197. In terms of x, the 
best-fitting quadratic (or parabola) is 


y. = 14.4225 + 0.33145(х — 50) + 0.005197(x — 50)? 
= 10.842 — 0.1882x + 0.005197х2 


The goodness of fit may be estimated from the sum of squares of residuals, 
which in this case amounts to 22.36. If a straight line were fitted by the same 
least squares process the equation would be 


ус = 18.71 + 3.3145u 
= 2.138 + 0.33145х 


The sum of squares of residuals is 164.95, so that the fit of the parabola is 
apparently considerably better than that of the straight line. The relation 
between quadratic and linear regression may be brought out by an analysis 
of variance, as in Table 12.1. 


TABLE 12.1 
Variation S.S. D.F. M.S. 
Total [S(y, — Ў)?] 1071.33 9 
Linear regression [S(ye — 3)?] 906.38 1 906.38 
About linear regression [50,2] 164.95 8 20.62 
Quadratic regression [S(ve — 5)?] 1048.97 2 524.48 
About quadratic regression [50,2] 22.36 у 3.19 


The variations “about regression" are the sums of squares of the residuals 
for the two fitted lines. The variations "due to regression" are calculated by 


" 1 " 
difference from the total S.S., which is Sy,? — KOD Since there are two 


constants for the straight line and three for the parabola, calculated from the 
data, the degrees of freedom for variation about regression are N — 2 and 

— 3 respectively. 

The reduction in S.S., due to replacing the straight line by the parabola, is 
164.95 — 22.36 = 142.59, with 1 d.f. This reduction may be compared with 
the S.S. about the parabola (22.36 with 7 d.f.). The F-value is clearly highly 
Significant (F = 44.7 with | and 7 d.f.). 


* 12.8 Orthogonal Polynomials The method of $ 12.7 has the disadvantage 
that if we want to improve the fit by using a higher degree polynomial than one 
already fitted (a cubic instead of a quadratic, for example) the coefficients for 
the new polynomial have to be calculated afresh from the beginning. The 
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method of orthogonal polynomials, suggested by R. A. Fisher, allows us to add 
new terms independently of those already calculated. Incidentally, tests of 
significance of the coefficients are simplified. 

Two polynomials Р(х) and Р,(х) are said to be orthogonal for the set of 
values x, (x = 1, 2... N)if 


(12.8.1) $ [Pilx,): P2(x.)] =0 


For example, the polynomials P, = x — 4, Р, = х? — 8х + 12, P = 
x? — 12x? + 41x — 36, are orthogonal to each other and to Py = 1 for 
x = 1,2,3...7, as is evident from the following table of values: 


TABLE 12,2 

x PoP PoP2 PoPs PiP2 Р\Рз РэРз 
1 —3 5 6 15 18 —30 
2 —2 0 6 0 12 0 
3 —1 -3 6 3 6 —18 
4 0 —4 0 0 0 0 
5 1 43 6 3 6 18 
6 2 0 —6 0 12 0 
T 3 5 6 15 18 30 

0 0 0 0 0 0 


It can be proved [5] that any polynomial of degree р can be expressed as а 
linear function of p -- 1 polynomials 


(12.8.2) P(x) = Лобо + As, sss AVE, 


where č; is a polynomial in x of degree i. The equality holds for N distinct 
values of x, denoted by x,, and the polynomials £; are all orthogonal to each 
other. If x takes the values 1, 2... N, the first few orthogonal polynomials are 

ёо 71 

ё =4,(x — X) 
(12.8.3) 4 & = [Gc — 3)? – (N? — 1)/12] 

ё = 43 [(х — X)? — (x — Х)(3№? — 7)/20] 

éa = А[(х — 3)* — (x — x GN? — 13)/14 + 3(N? — 1((N?— 9)/560] 
where X = (N + 1)/2 and the A's are constants chosen so as to make the č; 


integers (as small as possible) for all the N values of x. Thus if N — 7 we have 
x = 4 and the constants are: А, = 1, 4; = 1, 4, = 1/6, Ay = 7/12. 
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The sets of values of these polynomials for N — 7 are given in the following 
table: 


TABLE 12.3 
x & £a ёз a 
1 =3 5 —1 3 
2 —2 0 1 —7 
3 —1 —3 1 1 
4 0 —4 0 6 
5 1 —3 —1 1 
6 2 0 -l —7 
7 3 5 1 3 


On comparing Tables 12.2 and 12.3 it is clear that ү, 2, and č, are the 
Same as the polynomials previously called P,, P, and P}, except that they are 
now multiplied by the corresponding 4's. All the polynomials with even sub- 
Scripts (like č, and č4) have a set of values which is symmetric about the middle, 
while all those with odd subscripts (like &, and £4) are skew-symmetric. It is 
therefore unnecessary in a table to record all the values, and usually the lower 
half of the table (with the middle line when № is odd) is all that is actually 
Printed. For Table 12.3, this would be the values from x = 4 to x = 7. 

If a polynomial of degree p is to be fitted to a set of N observations (x,, y), 
the method of least squares applied to the equation y, — P(x), where P(x) is 
Biven by Eq. (2), leads to the set of normal equations: 


МА» + S(5,)44 +... + S(5,)4, = 5(у„) 
(12.8.4) S(6,1,)A4o + S(6,2)4, +... ++ 5(ё&,&)А„ = Sl Ya”) 


S(6,)4o +...  S(6,4, = 5(Уабра) 
However, because of the orthogonal property of these polynomials (including 


Čo = 1), all the terms but one on the left-hand side vanish in each of these 
equations. The set therefore reduces to 


(12.8.5) ASQ) = SOME. 1 = 0,1... 
from which the A; are immediately obtainable. 

The sum of squares of the residuals is given by 
(128.6 500,2) = S(y,2) — AoS(v2) — AiS(Vab ы... — ApS(Yač pa) 

Since 4, = S(y,)/N = J, the first two terms of S(v,?) give the total S.S. 
about the mean. The third term gives the reduction due to linear regression, 
the fourth term the additional reduction due to quadratic regression, and so on. 


_ The numerical work of calculating the A; is greatly facilitated by tables 
giving the values of č, to č; for different values of М. Such tables up to N = 75 
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may be found in Fisher and Yates’ Statistical Tables (Oliver and Boyd). More 
extensive tables up to N — 104 have been given by Anderson and Houseman [6]. 


EXAMPLE 3 Suppose it is required to fit polynomials up to the fourth degree 
to the data of Example 2, $ 12.7. We replace x Бу и = (x + 5)/10, so that u 
takes the values 1, 2...10. The values of 1, 2, ёз, £4 for М = 10 are read 
from the tables. 


TABLE 12.4 
x u y & £2 £a &4 
5 1 10.0 —9 6 42 18 
15 2 8.1 —7 2 14 22 
25 3 9.3 5 —1 35 17 
35 4 12.1 3 3 31 3 
45 5 13.6 I -4 12 18 
35 6 17.5 1 -4 12 18 
65 7 20.0 3 =3 31 3 
75 8 24.0 5 —1 —35 —17 
85 9 30.0 7 2 -14 —22 
95 10 42.5 9 6 42 18 
| 


We calculate S(y,) — 187.1, Sw) = 4571.97, 5(y,61,) = 546.9, 5(,,,)= 137.2, 
S(y,634) = 252.2, (у...) = 196.8. The values of 504,2) = 330, S(£,?) = 132, 
S(£,?) = 8580, S(£,?) = 2860 are read from the tables, as are also the values 
А, = 2, 22 = 1/2, 4; = 5/3, 4, = 5/12. Then 


Ао = 187.1/10 = 18.71, Ао5(у.) = 3500.64 
А, = 546.9/330 = 1.6573, A1S(y,£,,) = 906.38 
Аз = 137.2/132 = 1.0394, АЗ (у...) = 142.61 


Аз = 252.2/8580 = 0.029394, A, S(y,E3,) = 7.41 
Ag = 196.8/2860 = 0.068811, — A,S(y,£,,) = 13.54 


The polynomial is 
(12.8.7) Ye = 18.71 + 1.6573&, + 1.03942, + 0.029394£, + 0.068811£, 
where, by Eq. (3) with i = 5.5 and М = 10, 


& =2(и — 5.5) 
& =З[(и — 5.5? — 825] 
(12.8.8) és = и — 5.5? = 14.65(u — 5.5)] 


ča = Fallu — 5.5)* — 20.5(u — 5.5)? + 48.2625] 


The best-fitting straight line is given by the first two terms only of Eq. (7), the 
best-fitting parabola by the first three terms, and so оп. Оп replacing u by 
(x + 5)/10 we recover the results of $ 12.7. 


> 


12.8 REGRESSION ANALYSIS AND CURVE FITTING 341 


The analysis of variance is set out in Table 12.5. 


TABLE 12.5 
Variation S.S. D.F. M.S. 

Total 1071.33 9 
Linear regression 906.38 1 906.38 
About linear regression 164.95 8 20.62 
Additional for quadratic 

regression 142.61 1 142.61 
About quadratic regression 22.34 7 3.19 
Additional for cubic 

regression 7.41 1 7.41 
About cubic regression 14.93 6 2.49 
Additional for quartic 

regression 13.54 1 13.54 
About quartic regression 1.39 5 0.28 


šti Compared with the deviation about the cubic regression line, the additional 
™ of squares for cubic regression is not significant (F = 3.0 with 1 and 6 d.f.). 
Oweyer; compared with the deviation about quartic regression, the additional 
Ч for quartic regression is highly significant. (F = 48, with 1 and 5 d.f.). 
1$ indicates that the cubic curve is not appreciably better than the parabola, 
the quartic curve is a much better fit than both. | | 
а Г course, by using a ninth degree curve we could fit the given 10 points 
actly, but such a complicated curve is obviously not desirable. We have to 
™Promise between the desire for simplicity and the desire to get a good fit. 
Р, Second-degree curve (the parabola) is probably as satisfactory as any 
Употіа] in this example. А | 
© matrix which corresponds to 4 in Eq. (12.1.7) is now diagonal, so that 


it j 
$ Very easy to invert. In fact 
N 0 
S... 0 
Az=|° (быз 
о 8 585 


Y» В | 
вае ое a" = [5( &,2)]-", and the estimated variance of the coefficient А; in 
O iso, 1, 2... p, is given by 


(12,8 Sv? 
5.9) MM eL Re 
V(A) =а N-p-1 
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Thus in Eq. (7), the estimated standard error of 40 is [1.39/50]'/? = 0.17 
and that of A, is [1.39/(5 x 330)]'/? = 0.029. These values apply, of course, 
only if a fourth-degree curve is fitted, since the residuals v, relate to such a 
curve. The sum 50,2 is given by Eq. (6), with р = 4. 


* 12.9 А Test for Linearity of Regression with Grouped Variates When the 
number of obsérvations is sufficiently large to warrant grouping, we may be 
presented with a two-way table of data like that in $ 11.4, Example 2. Each 
column (or x-array) includes all the observations with x-values lying in one 
particular class-interval, and these are all assumed to have the same value, 
namely that at the centre of the interval. 

Within one column the y-values are also grouped in classes, and in each 
class (i.e., in each row of the table) all the observations are assumed to have 
the central value of y. We suppose that there are p columns in the table and 
that the total frequency in the i" column is f; where У f; = М. 

We can then define for each column the arithmetic mean у; of the y-values 
in that column. If these column means are plotted against the central x-values 
for the columns, the result of joining them is a sort of empirical trend line. In 
fact, if a straight line is fitted by least squares to this set of column means, each 
one being weighted with the corresponding column frequency, the result is 
precisely the ordinary regression line of y on x. 

The sum of squares of deviations from the column mean within one column 
is S0 — Ji)’, where the y; are the y values in the і" column, (x = 1, 2... f). 
The ratio of this sum, added up for all columns, to the total sum of squares for y 


about the over-all mean y, defines a quantity called the correlation ratio of y 
on x (E,,) by the relation: 


У 5 Qu — Y? 
12.9.1 1I-E,/-42 
( р i X 5 (Vie — Ў)? 


The denominator is simply (N — 1) sy? = S, (say), since the sum is over all y 
values in the table. The above expression may be compared with one obtained 
in Chapter 11—ѕее Eq. (11.5.2)—namely, 


E S Wie а)? 


= 2 — 
(12.9.2) [р TES 


where у; is the calculated value of y for the center of the i'" column, according 
to the linear regression equation of y on x. This indicates that £,, is similar 
in nature to the Pearson coefficient r. If the regression is in fact nearly linear, 
the two agree quite closely, but the more the regression (as indicated by the 
line of column means) departs from a straight line the more do Е, and r differ. 
The difference may be used to estimate the significance of an apparent departure 


from linearity. 
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The quantity Ej. may also be written 
(12.9.3) ESO 
where S, is the weighted sum of squares of the column means about the over-all 


mean and 5, is the total sum of squares for у about the over-all mean. That is, 


5 = Y fi — 7)? 
(12.9.4) ; 
5, = Y SQ, — Y = (№ - Ds? 


In the notation of $ 11.4, with the auxiliary variables и and v, 


S,- i? r4r- i) 


(12.9.5) F “(у м _ ме) 


" 
Ao | 
5, = (№ - ПК“, 

This gives the most convenient formula in practice for calculating E. 


Analogously to Eq. (3) we can rewrite Eq. (2) in the form 


(12.9.6) rm 


Where S, = Y, fei — J)? which is the weighted sum of squares for the 
calculated linear regression values у. Therefore, 


(12.9.7) (E, = т°)5, = 5; — S. 
=> ALG: 9° - G4 — XX] 


and so represents that part of the sum of squares for column means which is 
not accounted for by linear regression. If this part is large compared with the 
Sum of squares within columns about the column means, Yi SO — у)? 
= (1 — Е,,2)5,, we may reasonably reject the hypothesis that the true regres- 
sion is linear. The test ratio is therefore (E,,? — r?)/(1 — Е?) 

If the values of у within each column are normally distributed with a variance 
с? common to all the columns, then S Viz — J)? for any column is distributed 
as x?s? with f, — 1 degrees of freedom. Therefore (1 — Е.2)5, is distributed as 
x70? with Y, (f; — D = N — p degrees of freedom. 

Furthermore, the p values of у; are each normal with variance c?/f;, so that 
DAG; — jy? is distributed аз y707 with p — 1 d.f. Also 


Yfoa- Hy = 5° x До — XY 
= b*(N — Ds? 
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Now, as shown in $ 11.5, b is normal with variance c?/[(N — 1)5,2], so that 

Ь (М — 1)s,? is distributed as °g? with | d.f. It follows from Eq. (7) and 
Theorem 4.3 that (Ej,? — r?)S, is distributed as y?o? with p — 2 d.f., and is 
independent of 5;. The ratio 

N—-pE,-—r 
| p-2 1- EZ 
is therefore distributed as Snedecor's F with p — 2 and N — p degrees of 
freedom. A significant value of F indicates a significant departure from linearity. 
The test is one-tailed. 


(12.9.8) 


EXAMPLE 4 The following table represents some results on the relation 
between the percentage protein in wheat (у) and the yield in bushels per acre (x) 


TABLE 12.6 

vfv v?fv 

5| 25 

0 0 

12| 36 

8 16 

9 9 

0 0 

$| 3 1 2| 2| | m|-m| m 

= Si 8j 31 1 12| —24 | 48 

ET 2 1 1 7 7 2| 20| —60 | 180 

= B obj mI. 3 s| —20| 80 

11 2 | 91 | —82 | 406 
ufu 33 8 0 
изу, 90} 48| 22 0 7| 56| 99| 32| 354 
V 16| 16 9|—19| —14 | —37 | —29 6 | —82 
Vu —48 | —32 9 0 14 74 87 24 |270 

Vifu | 1.60] 1.33|-0.41| 1.46 —2.00]—2.64|—2.64 —3.00 

узуу | 25.60] 21.33| 3.68] 27.77] 28.00] 97.79) 76.45| 18.00/298.62 
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for a set of 91 experimental plots. We suppose that it is desired to predict y for 
a given x, so that the x values may be regarded as fixed. 


The coded и and v values are given by 


gie Ex v = у — 13.45 
From the table and Eq. (5) we obtain 
—82\? 
S; = 298.62 — s = ) = 224.73 


—82\? 
S, = 406 — 91 | = 332.11 
E 
whence 
E 0477 
Also the sum of squares for x and the sum of products for x and y are given by 
S, = 25(354 — 0) = 8850 
Sxy = 5(—270 — 0) = —1350 
$0 that 
(—1350)? 


а 5101620 
" = (8850)832.11) 


Therefore 
N — p E? — r? 830057 
“pei D—EQ 6 038 
-244 


with 6 and 83 d.f. The 5% point is about 2.21 so that there is a significant 
departure from linearity. 

The original data (ungrouped) are given in Snedecor's Statistical Methods 
(4th ed., p. 380), where a parabola is fitted by the method of $ 12.7. The dif- 
ference between the S.S. about the parabola and the S.S. about the best straight 
line is significant as compared with the former S.S. itself. This confirms the 
departure from linearity. The curve of column means (in units of v) is plotted 
in Fig. 53, the necessary data being obtained from the last row but one of 
Table 12.6. For comparison the best-fitting parabola is also shown. 


* 12.10 The Distribution of the Correlation Ratio The correlation ratio Ej, 
may be used as a measure of the degree to which the observations (as grouped) 
tend to cluster around the curve of column means, just as Pearson's r measures 
the degree of clustering around the straight regression line of Y on X. A similar, 
but usually different, ratio, E,,, measures the clustering around the curve of 
TOW means. It may be calculated in the same way as E,,, with x and y inter- 
changed throughout. 
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On the hypotheses mentioned in the previous section, and on the assumption 
that there is really no association between the variates in the parent population, 


Protein Content 


Wheat Yield —-u 


Fic. 53 CURVE OF COLUMN MEANS AND FITTED PARABOLA, 
FOR DATA ON WHEAT YIELD AND PROTEIN CONTENT 


the distribution of E,,? was worked out by Hotelling [7]. He showed that EG 
is a beta-variate with parameters n, = p — 1 and n; — N — p. It follows 
therefore that n; E,,?/[n,(1 — E,,?)] has the F-distribution with n, and n; d.f. 

The significance of an observed E,, may be tested by means of Pearson’s 
Tables of the Incomplete Beta Function or ordinary tables of F. A special 
table for large values of N, 50(1)1000, was prepared by Woo [8]. 

If the population correlation ratio Пух is not zero, but if we assume that in 
all samples there are the same set of frequencies f, the density function for E? is 


H(AE?) 


2) — р-(Е2\а-1(1 _ p2w-1 
(12.10.1) ЦЕ”) =e «E*y"'ü — E?) Ba, Б) 


where Е? is written for E,^, у? for q,7, a = n2, b = n,/2, and A= 
м1?/[2а — 1?)]. The function H, with argument AE?, is called the confluent 
hypergeometric function, defined by the series: 
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(12193 Ше Лб Ей? aa 


1!а 2!а(а +1) 


Tang [9] has tabulated the distribution function for E?, namely, the proba- 
bility that E? < E,? for certain values of A, E,? being fixed by the condition: 


1 
(12.10.3) | ЛЕЗ] = 0) ЧЕ? = а 
Ea? 


for « = 0.01 ог 0.05. The tabulated probability is therefore that of an error 
of the second kind (§ 6.6), the chance of an error of the first kind, namely, a 
wrong rejection of the null hypothesis that 4 = 0, being fixed at the value g. 

It may be noted that E? has the same distribution as x in $ 9.12, where x is 
the ratio of the S.S. between treatments to the total S.S. in a one-way analysis- 
of-variance problem. The difference is that the number of treatments is replaced 
by the number of columns and the S.S. between treatments by the S.S. between 
column means. The null hypothesis of no treatment effects becomes the hypo- 
thesis that in the population the column means are all equal. Under this hypo- 
thesis, and if the variance of Y is the same within each column, 5/02 has the 
X^ distribution with п (= p — 1) d.f. Under the alternative hypothesis (that 7, 
and therefore A, is not zero) it has the non-central chi-square distribution (Appen- 
dix A.13) with non-centrality parameter 


Nn? Е N(oy? — 0?) 


(12.10.3) Saf 38 


Where оу? is the variance of y in the population and c? is the variance in each 
column about the column mean. When 4 = 0 this distribution becomes the 
Ordinary (central) chi-square distribution. 


* 12.11 Exponential Regression Itis not uncommon for a variable Y to increase 
Or decrease with time at an approximately uniform percentage rate. This holds, 
for example, for money accumulating at a fixed rate of compound interest or 
for a bacterial population growing in an ample supply of culture medium. In 
fact this type of increase is often referred to as "the law of growth." It may be 
expressed mathematically by the relation: 


(12.11.1) A =ках 


Or, in integral form, 
(12.11.2) Y = Ае‘ 
If we wish to fit such an exponential curve by least squares to a set of N 


Pairs of observed values (x, у,), « = 1,2... N, we have to calculate А and k 
from the relation: 


(12.11.3) $ (Ya = Ae*)? = minimum 
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from which we get, by differentiating, 

S e" (Ach — y) =0 
(12.11.4) . 
$ x,e(Ae™ — y) =0 


The exact solution of these equations for the unknowns A and k is tedious 
and time-consuming. It is customary instead to write Eq. (2) in the form 
(12.11.5) log Y —log А + kX 
and to fit a straight line by the method of Chapter 11 to the observed values of 
log y, and x,. This, of course, means that the sum of squares of deviations for 
log Y is minimized instead of the corresponding S.S. for Y. If the standard 
deviation for Y is proportional to Y itself, as seems to be nearly true for many 
types^ of data in economics, this procedure is quite reasonable, since 
6 (log Y) = д Y/Y and the standard deviation of log Y is therefore approximately 
constant. If, however, there is reason to believe that the standard deviation 
of Y itself is constant, the effect of the customary procedure is to give undue 
weight to the smaller values of Y. 

A method of allowing (at least approximately) for this effect is to fit a straight 
line to the observed log y, but to weight each observation proportionately to у. 
The weighted least-squares condition is 


(12.11.6) S(log y, — a — kx,)?: y, = min 
where a — log A in Eq. (5). This furnishes the normal equations: 
aS Yq + kS(x,y,) = 5(у. log у.) 
а5(х.у.) + К(х,2у,) = 5(х,у, log Ya) 
from which a and k can be found. If common logarithms instead of natural 


logarithms are used, the equation for Y will be of the form Y = 4:10* instead 
of that in Eq. (2). 


(12.11.7) 


TABLE 12.7 
x y logioy 
6 0.029 —1.538 
7 .052 —1.284 
8 .079 —1.102 
2 .125 —0.903 
10 .181 —0.742 
11 .261 —0.583 
12 .425 —0.372 
13 .738 —0.132 
14 1.130 0.053 
15 1.882 0.275 
16 2.812 0.449 


ExAMPLE 5 (Snedecor [11]) In Table 12.7, x represents the age in days of 
chick embryos and y the dry weight in grams. 
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The calculated у, for a given x, obtained by fitting a straight line to the 
weighted values of log y, is y, = 0.001875(10)9-1?8?* = 0.001875е°-4581х. 

Without weighting, the result is y, = 0.002046e?-55!!*, while the exact least- 
squares solution is y, = 0.001895е°-+573х. The method of weighting the observed 
values of log y gives (at least in this example) a very good approximation. 

In some problems the data follow more or less a modified exponential curve, 
expressed by 


(12.11.8) У =C +A" 


The exact least-squares solution of Eq. (8) for a set of sample values x,, у, 
is even more difficult than for Eq. (2), and when plotted on semilog graph paper 
the points (x,, у) do not lie nearly on a straight line. However it is possible 
to use a graphical method due to Cowden [10] to obtain approximate values of 
C, A and k, and these may be improved, if necessary, by using Seidel’s process 
(§ 12.12). 

The data are plotted on ordinary or semilog graph paper and a tentative 
trend line is drawn in by hand. Three equidistant ordinates Yo, У, and Y; of 
the curve at convenient values of Х (X — h, X and X + h, say) are measured, 
and C is estimated from the relation: 


= 


Y,Y, = ү;2 


2 = 
GRILLA) учу 2, 


Values of y, — С are now plotted on semilog graph paper. If C is correct, these 
Points should lie close to a straight line; if there still appears to be some curvature 
the value of C may be readjusted slightly by trial. From the resulting straight 
line, 4 and А may be estimated, А being the ordinate at Х = 0 and e" the 
Tatio of the ordinates at Y = x, and X = 0. 


* 19:19 Seidel's Method of Successive Approximations Sometimes it is con- 
venient to obtain approximate values of the regression coefficients from a graph. 
With these as a start, Seidel’s method permits better values to be obtained by a 
least-squares procedure. 

Suppose the regression curve is of the form 


Y =f(x, Во, Ву) 


where Во and В, are the true parameters. (The method can easily be extended 
to more than two parameters.) If the preliminary approximations are bọ and 5,, 
let ób, = Во — bo and 5b, = В, — b,. Then if the approximations are reason- 
ably good we should be able to neglect squares, products and higher powers of 
56, and ób,, so that 


(12.12.1) Y — f(x, bg + ôbo, by + ób4) 


à 
df ЗЫ 


£ f(x, bo, by) Ms Bis 
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where 0/7060, д/]дЬү mean the partial derivatives of f(x, Ву, B1) with respect to 
Во and Ви, evaluated at By = by and fj; = b,. Then 0% and ôb, are selected so 
as to minimize the sum of squares of residuals 
of of s 
p = Ј(х,, bo, by) — 3b, dbo — ab. o] 


The normal equations for 5b) and ôb, may therefore be written 


of of of ) 
А6 f Ib, 920 gn, ^r] em 
(12.12.2) à a » 
О, 
$5507 уу 5% эһ, 9) =0 


where, in f апа its partial derivatives, x is replaced by x, On solving these 
equations and adding д to by and ôb, to by, the preliminary approximations 
are improved. The process can be repeated if necessary and usually converges 
quite rapidly. 


EXAMPLE 6 Suppose that we have plotted on semilog paper the data in 
Example 5 above and have drawn by eye an approximately best-fitting straight 
line. From the line we estimate k, = 0.20 and a, = log A, = —2.70, these 
being first approximations to k and a(= log A) respectively. Then 


log Y =a + kx 


so that 0д//да = 1, Oflók = x. The weighted normal equations, weighted 
according to the values of y,, are 


S[».(log y, — a, — Кух, — да: — x,0k,)] =0 
S[yx.(log y, — a, — Ках, — да: — x,9k))] =0 
On substituting the values of x, and y,, these become 
7.7146a, + 110.712ók, = —0.32786 
110.7126a, + 1619.185k, = — 4.73783 


from which 
óa, — —0.0271, ôk, = —0.0011 


Therefore the improved values of a and k are 
аз = —2.70 — 0.0271 = —2.7271 


k, = 0.20 — 0.0011 = 0.1989 
The fitted curve is 
log У = —2.7271 + 0.1989x 
or 
Y =0.001875(10)°-1989х 


which agrees with the result quoted in 8 12.11. 


REGRESSION ANALYSIS AND CURVE FITTING 351 


PROBLEMS 
A. (88 12.1-12.3) 
1. Write out the normal equations (12.1.5) for the case of two predictors 
(Xo = 1, Xi = X, Xs = Z) and show that these equations can be put into the form: 
y = bo + bit + bez 
Syy = узу? + bySyz 
Syz = 6,502 5,92? 
where X, у, 2, are the means, and $427, 5,2, 5,2, Зух буд, Syz are the variances and 
covariances, for the variates X, Y and Z. 
2. Show that the equation of the regression plane of Y on X and Z may be written 
(ус = dilsr) + (x — XXdelsx) + (z — 2)(ds/sz) = 0 where di, ds, ds are the 
cofactors of the elements of the first row in the determinant 


1 Гху Fiy 
d= Pas 1 rey 
Гру "xz 1 


Hint: See Problem 1. The quantities гуу, "yz, "zy are the respective coefficients of 
correlation. 

3. (Hooker) For a certain district in England, records were kept over 20 years of 
the following variates: 

Y — seed-hay crop (cwt/acre) 

X — spring rainfall (inches) 

7 = accumulated temperature above 42°F in spring. 
From the data the following statistics were calculated: 
X —491, y = 28.02, 2 = 594, sy = 1.10, s, = 4.42, sz = 85, гуу = 0.80, ry; 
70.56, rz = —0.40. 

Use the result of Problem 2 to find the regressiotr of hay crop on spring rainfall 
and accumulated temperature. 

4. At an experimental farm in Alberta, records were kept over 35 years of the 
evaporation ( Y) from an open tank. It was thought that this might be related to the date 
of observation (X) and to the annual rainfall (2). With the date coded as an integer 
from 1 to 35, the following observations were recorded: Sx, = 630, Sx,? = 14,910, 
52а = 286.90, Sza? = 2563.47, Sy, = 452.53, Sya? = 5980.79, бхауа = 7814.63, 
$2у = 3626.96, Sxaz, = 5287.95. 

Write out the matrices А and g and find, Бу Cramer's rule, the predicting equation 
for Y in terms of X and Z. 

5. (P. O. Johnson). As part of a study dealing with the prediction of freshman 
achievement in college, the following scores were noted for each of a random sample of 
50 students: 


У = honor-point ratio at end of freshman year, 

X1 = score on an English test, 

Xa — score on an algebra test, 

Xs — percentile ranking at high school graduation, 
transformed to probits (this provides in effect 
a normal variate with mean 5). 


From the data obtained, 


Sys — 36.19 Sya? = 40.6393 N = 50, 

Sxia = 4802 бха = 1560 Sxsa = 248.22 
Эха? = 487,798 Sxsa? = 66,942 бхза? = 1260.7630 
$хїауа = 3533.58 SX2aya = 1213.74 Sxsaya = 189.1539 


Ѕхлахза = 157,863 Sx1aX3a = 23,926.22 Sxsaxaa = 7804.57 
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Write out and solve the normal equations and so obtain the equation for predicting 
Y from Xi, X» and Хз. 


В. ($$ 12.4-12.6) 

1. In Problem A-4, invert the matrix A. Use Eq. (12.5.6) to find the sum of squares 

of residuals, and so obtain an estimate of the variance о? of Y about the true regression 
lane. 

i 2. In Problem A-4, obtain 90% confidence intervals for the regression coefficients 

Во Bı and В». (For 32 degrees of freedom, t, = 1.694.) 

3. Write down the matrix A for the data of Problem А-5. Calculate the diagonal 
terms in the inverse matrix and hence obtain the variances of the three regression 
coefficients bı, b» and bs т the predicting equation for Y. Show that only the regression 
on Хз is significant. 

4. From the data of Problem A-3, calculate approximately the values of the matrix 
elements aij, and so obtain an estimate of the variance о? about the true regression 


plane. Estimate also the variance of the predicted value of Y for a new pair of observed 
values of X and Z. 


С. ($$ 12.7-12.8) 
1. Fit a second-degree parabola to the following data: 


x 1.0 1.5 2.0 2,5 3.0 3.5 4.0 


y 1.1 1.3 1.6 2.3 2.7 3.4 4.1 


2. Construct an analysis of variance table for the data in Problem C-1, and deter- 
mine whether there is a significant reduction in the sum of squares about the regression 
line when the straight-line regression is replaced by parabolic regression. 

3. If a correlation index re is defined by г? = 1 — (Sva?)/[((N — 1)s,?], where va 
is a residual for parabolic regression, find the value of r. for the data of Problem С-1. 
Compare with the Pearson coefficient of correlation for the same data. 

4. (Holzinger) In the following table, X represents mean age in years for a group 
of men, and Y their mean vital capacity. Use the method of orthogonal polynomials 
to find the equation of the best-fitting cubic curve. 


X Y x P X Y 
19.5 227 355 223 55.5 201 
22.5 230 40.5 218 58.5 185 
25.5 230 43.5 216 61.5 200 
28.5 237 46.5 210 64.5 169 
31.3 227 49.5 205 67.5 160 
34.5 229 52.5 193 70.5 163 


Note: The following extract from Anderson and Houseman's tables is relevant: 


М = 18 
ü 1 3 5 7 9 11 13 15 17 
£s —40 —37 31 -22 —10 5 23 44 68 
£s —8 23 —35 —42 —42 =33 —13 20 68 


S(£1?) = 1938, 5(&2?) = 23,256, 5(&3?) = 23,256 
Ay = 2, А = 3/2, As = 1/3 
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5. Draw up an analysis of variance table for the regression of Problem C-4. Find 
estimates of the standard error for each of the coefficients of the orthogonal polynomials 
obtained in this problem. 


D. ($$ 12.9-12.10) 

1. In the following table X is the amount of irrigation water (inches) applied to a 
crop, and Y is the crop yield in bushels per acre. The numbers in the headings are the 
кеш (central values) of the respective classes. Test the regression of Y on X for 

inearity. 


X 12 15 18 21 24 27 30 

Y 

90 1 2 3 

85 2 3 5 

80 2 5 4 1 12 

75 2 4 6 1 13 

70 4 3 1 8 

65 2 3 5 

60 2 2 4 
2 4 10 19 8 5 2 50 


2. In Problem B-4 of Chapter 11, calculate the two correlation ratios Ey: and Ezy. 
Does either regression (of height on weight or of weight on height) depart significantly 
from linearity ? 

3. Prove the statement in $ 12.9 that if a straight line is fitted by least squares to 
the weighted column means in a grouped two-way table, the result is the ordinary 
Tegression line of Y on X. 

4. Show that when A = 0, the variate E? in Eq. (12.10.1) becomes a beta-variate. 
Show also that in this case пзЕ?/[п1(1 — E?) has the F distribution with m and из 
degrees of freedom. 


E. (88 12.11-12.12) 

1. The uniform horizontal scale on a sheet of semilog paper ranges from 0 to 10. 
The vertical logarithmic scale (on the left side) ranges from 100 to 1000. A straight 
line is drawn from the upper end of the vertical scale to the midpoint of the horizontal 
Scale, What is the equation of the line (а) in the coordinates x and у’, where у’ = logioy, 
(b) in the coordinates x and y? 

2. A straight line is drawn on semilog paper through the points (2, 1) and (4, 100). 
What is the equation of this line? 

‚ 3. Fit an exponential curve to the following data (a) without weighting, and (b) 
With weighting: | 


* 1 2 3 4 5 


y 251 4.3 14.5 42.2 123.1 


Hint: For part (b), use Eq. (12.11.7). 

4. Prove that the exact least-squares solution of the problem of fitting the modified 
exponential curve у = С + Ae*? requires the determination of A, C and k from the 
equations: 

Sya = NC + ASetz« 
5(уаект) = CSek= + Ает 
$(уахае®*®“) = С5(хаект=) + А5(хае?Ат=) 
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5. Use the method of least squares to fit the curve y = ах? + Б/х to the following 
data: 
x 1 2 3 4 


у | —1.51 0.99 3.88 7.66 


6. The logistic curve yc = а(1 + bg7)*! has been used to represent population 
growth. Fit such a curve, by Cowden's method, to the following data on the population 
(in millions) of the United States, 1790 to 1950: 


x 1790 1800 1810 1820 1830 1840 1850 1860 

met] 3.93 5.31 7.24 9.64 12.87 17.07 23.19 31.44 
1 1870 1880 1890 1900 1910 1920 1930 1940 1950 
y | 39.82 50.16 62.95 76.00 91.97 105.71 122.78 131.67 150.70 


Hint: Write the equation 1/у = A + Ват, where A = Ша, В = bja, д = e”, and plot 
values of 1/y instead of у. Use coded x values. 
7. Show that the Gompertz curve, уг = ab*', may be approximately fitted by 
Cowden's method if log y is plotted instead cf y. This curve is used in actuarial work. 
8. The following data were obtained in a physical experiment, where E represents 
the energy radiated from a carbon filament lamp per cm? per sec, and T the absolute 
temperature of the filament in thousands of degrees K. 


T 1.309 1.471 1.490 1.565 1.611 1.680 


E 2.138 3.421 3.597 4.340 4.882 5.660 


By plotting on log-log graph paper it is seen that the data follow apparently a law of 
the type Е = aT”, with a = 0.725 and b = 4.0 approximately. Use the Seidel method 
to improve these values of a and b. 
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Chapter 13 


SOME REMARKS ON MULTIVARIATE 
PROBLEMS AND STOCHASTIC PROCESSES 


13.1 Multiple Regression in Terms of Correlation In Chapter 12 we con- 
sidered the multiple regression of one variate on a number of others, and the 
present treatment is closely connected with that in $$ 12.1 to 12.6. 

For simplicity we will first assume that the predicted variate Y depends on 
just two predictors Y, and Х,. (The generalization to any larger number is 
easily made.) The linear predicting equation for Y is of the form 


(13.1.1) Ve = bo + bix; + Бх, 


Note that we are not now using the dummy variate (which is always equal to 1) 
as т $ 12.1. The new notation will be more convenient in the present context. 
Geometrically, Eq. (1) represents a plane in the three-dimensional sample space 
with coordinates x, x; and y. Thisiscalled the regression plane of Y on X, and X;. 

Suppose that N sets of observations (Xio Хз, у.) are made on the three 
variates. For the sake of uniformity we will let the variate Y be called Хо, 
with values denoted Бу хо,. The normal equations corresponding to Eq. (12.1.5) 
will now be 


Nbo  b,Sx,, + bSx,, = 5х0, 
(13.1.2) boSX1q + b S(x,,2) + biS(x,,x;,) = S(XoaX 1a) 
boSxa, + 6.5 (нах 2) + Ь,8(х2,?) = S(X 04X24) 


To simplify these equations we can suppose that the origin is chosen at the 
arithmetic mean of each of the variates Xo, X; and X,. Then Sx,, = 5х,, = 
Sxo, = 0. Also, 517, s3? and sọ? denote the sample variances of Ху, X; and Хо 
respectively, and if r;; is the sample Pearson coefficient of correlation between 
X; and X; (i, j = 0, 1, 2), we have 

S(x,,?) =(N- 1)s,?, S(X2_”) =(N— 1)5;? 
(13.1.3) 
S(x1,x3,) = (М — 1)г, 25152, etc. 
so that the equations of (2) become 
by =0 
(13.1.4) bis, + bary252 = Го150 
biri25; + 6252 = rosso 
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From these we get 


Solfo: — Toari3) b So("o2 — Го1Г12) 
2 = 
s,(1 — 7,5?) 52(1 — гіз?) 


(13.1.5) b, = 


If we let Кү, be the cofactor of гү; in the determinant of the correlation matrix 


(13.1.6) Е = |: rui тә 
To? Таз 22 
where, of course, rj, = | when i = Л we may express b, and b, in the equivalent 
forms: 
_ _ 50Кол 
= 
$1 К 
(13.1.7) as 
SoRo2 
b, = ———— 
52Коо 


The equation of the regression plane may therefore be written in the symmetrical 
notation: 


XoRoo + Х1Ко1 $ X2Ro2 


0 


(13.1.8) 


So Sy S2 


If we have р variates on which Хо may depend, the equation of the regression 
hyperplane of Хо оп X, Xp... X, is 
(13.1.9) P ХК _ 9 
i-o Si 
where Ro; is the cofactor of ro; in the determinant of А, the matrix with typical 
element r, j The predicted хо can be written, when Roo ¥ 0, 


(13.1.10) Be see 


So that the relative contribution of the variate X; to the prediction of X, is 
Measured by the coefficient (Ro;s5)/(Roos;). Equation (10) is, of course, precisely 
*quivalent to Eqs. (12.1.1) and (12.3.2) but in a different notation. The purpose 
of giving this alternative form is to showthe relation of the regression coefficients 
to the coefficients of correlation between the different variates. 


13.2 Multiple Correlation If v, is the difference between the observed Xóa 
> the computed value given by Eq. (13.1.10) for the observed xj, 
#=1,2...р), 


р в 55. \2 
(13.2.1) So,? = S (xox +> | 
i=1 0057 
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This is the sum of squares of residuals. On using Eq. (13.1.3) it becomes 
; (N-Us* 2 2 
(13.2.2) 5ш = ur cM Roo? + У. ог + Roo У, Кого + È RoRojrij 
т Т [ЕУ 
М— 1)? 2 
“ы У К.К; 
Коо 5] =0 
But we know from the properties of cofactors that У, Roi; 0 if i 0 
and =d(R) if i = 0, where d(R) is the determinant of К, so that 
d(R 
(13.2.2) 50,2 = (М — Ds? aR) 
Roo 
The quantity S(v,?)/(N — 1) is the approximate variance of estimate of xg, 
denoted by 572... Therefore 


(13.2.3) Siap =So 


As in § 12.6, it may be proved that (N — 1)59712--p/(N — p — 1) is an un- 
biased estimate of the corresponding population parameter 9,4;..,, usually 
denoted simply by c?. 

The variance due to regression may be defined by 


P 


(13.2.4) 5-р =s — 50212 .-р 
d(R 
EXT d ( = S. 
00 


The ratio of the variance due to regression to the total variance of Xo 
(namely, sọ?) is the square of the multiple correlation coefficient 

d(R) 

Roo 


For the case р = 1, d(R) = 1 — rg? and Rog = 1, so that 7о,12..р Teduces to 
the ordinary correlation coefficient between Ху and Xj. 


(13.2.5) rds. 71 


EXAMPLE 1. The variates Xo, X; and X; have pairwise correlation coefficients 
гоа = 0.8, ro = —0.7, rj = —0.9. The matrix R is 


1.0 08 —0.7 
R= 0.8 10 —0.9 
—0.7 —0.9 1.0 
so that d(R) = 0.068, Roo = 0.19. Therefore ‚о, = 1 — 0.36 = 0.64, so that 
пода = 0.80. 


EXAMPLE 2 If ro, = 0.6 and ro; = 0.4, find ri; so that ro 12 = 1. 

If we write гү; = ғ, and substitute the known values іп А, we find that 
d(R) = г? — 0.48r — 0.48 = 0. The solution of this quadratic equation gives 
r = 0.97. In this example there is perfect multiple correlation of X with X, 


— 
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and X2, in the sense that all the observed points lie in one regression plane, 
even though the individual correlations of X; with X, and XY, separately are 
not large. : 

The variance of some future observed value хо corresponding to assigned 
values x,, хэ... x, of the predictors (this set not being any of the N sets of 
values already used in computing the correlations) is given by Eq. (12.4.8). In 
the notation of the present chapter, and with the origin placed at the sample 
mean, the matrix А of $ 12.1 is 


N 
— 0 
N-1 
Sy" ТЕЛ 
(13.2.6) A-(N-1) 
. 0 7125152 ... 525 
0 515 — 5,2 


For the special case р = 2, d(4) = ММ — 1)25,252(1 — r,5?) and 
(13.2.7) 


N-I 0 0 
i N 

A z(N—])! AER 2 

( b 0 571—715), —51 "s ry = 7122) з 

0 =a h nyi -nj)!, 52 7(1 — 4,7)! 
$0 that 
1 x?/s,? + х22/522 — 27. охх2/($152) 

(13.28) уху el 1 үз lsi x [82° = 27125135) 

a de d (М - 10 - 5) 


The multiple correlation coefficient may be regarded as the ordinary corre- 
lation coefficient between the observed and the computed values of Xo, the 
latter being given by Eq. (13.1.10). By using Eq. (4) for the variance of the N 
computed values, we obtain for this correlation coefficient: 


(13.2.9 - -rs Кох | / [оз =] 
) r (N— DR; $ 24 ої Хоа 05012 --р 


А р 
Since SXigXoq = (М — D)ro;sos; and У Коо; = ФК) — Roo, this reduces to 
[zy 


(13.2.10) — > = x 
5012.:р Коо 
E _ ак) 1/2 
Е [ ы] 


Which is the same аз 70,12 -- р 
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* 13.3 The Distribution of the Multiple Correlation Coefficient From Eqs. 
(13.2.2) and (13.2.5), 


(13.3.1) SON NEC NUN 
S S(xo,?) Коо и 
whence 
2 2 2 2 
Го,12..р _ Sox J= 5007) Soe) 
13.3.2 : = 
( ) 1- она 2: S(v,?) 5(02) 


The numerator is the sum of squares of the calculated values of the variate Xo 
(the sum of squares due to regression), while the denominator is the sum of 
squares about the regression plane. If the variates Y,, Х,... X, all have fixed 
sets of values and if X, is a random normal variate independent of Х|, X; .. . X, 
(which means that the true multiple correlation coefficient is zero), the numerator 
and denominator are independently distributed as 770? with p and N — p — 1 d.f. 
respectively. Here c? stands for o4, .. m the population variance about the true 
regression plane. It follows that 


_(N = p - DSi?) 
Е р5(0,2) 
has the F distribution with p and У — p — 1 d.f. This means that rg5.., isa 
beta-variate with parameters p/2 and (N — p — 1)/2 and so its distribution is 
identical with that of the squared correlation ratio E,,” (see $ 12.10) with p + 1 
instead of p. The slight change arises from the fact that we are now dealing 
with p + | variates altogether, namely, Хо (or Y) and Х,, №... Xp 

The distribution of the multiple correlation coefficient, when the corres- 
ponding coefficient ро,12.. in the population is not zero, was worked out by 
Fisher [1]. The density function is 


(1.34) — f(?) = (1 — gh) nq — умер 2202) 


where r? and р? are written for the squares of the multiple correlation coefficient 
in the sample and in the population, and where 


F[(N — 1)/2, (N — 1)/2, р/2; p?r?] 
B[p/2, (N — p — 1)/2] 
the numerator being a hypergeometric function and the denominator a beta 


function. 
The definition of the hypergeometric function as a series is 


ab Z а(а + 1)b(b + 1) 2? 
e м с(с + 1) 2! 
а),5] 


(13.3.3) 


g(r?) = 


(13.3.5) F(a, Б, с;2) =1 + 


п=0 CM: E 
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where [a], = a(a + 1)(а + 2)...(a +n – 1). For the properties of this 
function see references [2] and [3]. 
The expected value of г? 15 given by 


N-p-1 N+1 
(13.3.6) Be?) =1-— а -». Le р.) 
N—p-1 = 2р? ) 
=1- er — 
| N-1 2 ш a 
When р = 0 this reduces to 
2 p 
13.3.7 Е”) == 
( ) I= Tai 


It may be noted that Fisher's z'-transformation ($ 11.13) can be applied to 
the multiple correlation coefficient and brings about approximate normality, for 
moderately large №. The variance of z' is p/(N — p — 2), approximately. 


13.4 Partial Correlation Sometimes we would like to know what the 
correlation would be between say Хо (or Y) and X, if the influence of all other 
variates such as Xz, Хз... X, were eliminated. This is called the partial corre- 
lation of Ху and X, and the coefficient is written ro,,3..,. It is in general 
different from the ordinary correlation coefficient ro, for Хо and X,. For 
simplicity we consider the case of three variates, Xo, X, and А, for which the 
three pairwise Pearson coefficients of correlation are roi, ro», Г12, and we 
Suppose that XY, is the variate to be eliminated. The effect of X; on X, is 
estimated from the ordinary regression of Хо on X;, ignoring Xj, and is given 
by ro5s9x5/s, for a measured value x;. (Each variate is supposed measured 
from its own mean as origin.) The residual part of the observed x, after sub- 
tracting the part due to X», is 


x 
(13.4.1) Xo.2 = Xo — оз 72 


In the same way, the residual part of the observed x, is 
x 
(13.4.2) Х1.2 = X4 — 71281 = 
S2 
The partial correlation coefficient гол 2 is defined as the ordinary correlation 
Coefficient for xy; and x,.;. That is 
А 2 S(Xo.3) 61.2). 
ОКА (ы-у вуз 
the numerator being summed over all the М sets of observations. In the de- 


nominator, sọ% is the residual variance of хо after eliminating the regression 
Оп x5, so that 


(13.4.4) So» = Sol == гоз?) 


(13.4.3) 
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Similarly, 

(13.4.5) 54%, = 54 (1 — rug?) 

Using Eqs. (1) and (2), we find 


5 
(13.4.6) — S, (Xo.2)4(X1.2) = ЗХоа Ха — roa E Sx 12X22 
2 


s 50 


1 51 2 
Sxo,Xa, + To2ř12 p SX2a 
2 


= т 
52 


So 
=(№ – D(roisos — Го2 em 7125152 
2 


51 
— Cui 7025052 + пата) 
2 


= (№ — 1)5051(о1 — ro2r12) 


Therefore, 


Tor — l'o2712 
pest rora ^ [nud ты 


This expresses the partial correlation coefficient in terms of the three ordinary 
correlation coefficients. In terms of the correlation matrix R, as defined in 
Eq. (13.1.6), 


Roi 


13.4.8 Ennium IRL. 
( ) нё (RooR11)"/? 


and this form can be generalized for р variates X,, Х,... X, Thus 


Во: 


(13.4.9) Toi,2..p = Вов)" 


where the correlation matrix now has p + 1 rows and columns. 

In certain circumstances the partial correlation coefficient 791,2 is the same 
as the ordinary coefficient ғо; when the third variate X. 2 15 held constant. That 
is, if we select out of the set of all observations a subset in which X; = х, (very 
nearly), and calculate ro, for this subset, we get in effect То1,2. In general this 
is not true, and the result depends on the chosen value X5; the calculated ro; 
is equal to 7o,,; if and only if (a) the bivariate regression of X, on X; (ignoring 
Х,) is linear, with the variance of X,.constant for all Х,; (b) the trivariate 
regression of Хо on X, and X, is linear with the variance of Хо constant for 
all. Y, and X;. These conditions are not likely to be satisfied very precisely in 
practical applications, and the calculated value of ro, ‚2 Will be a sort of average 
of the correlations го! that would be obtained for different assigned values 
of x, (see Problem A-6). 
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EXAMPLE 3 The following results were obtained at Syracuse University in 
an investigation by M. A. May of the factors influencing “academic success." 
The sample consisted of 450, students and the variates were Y (honor points), 
X, (general intelligence) and X, (hours of study per week). One object of the 
investigation was to find to what extent honor points were related to general 
intelligence, when the effect of varying study periods was eliminated. From 
the data, 


Xo = 185, X, = 100.6, X, =24 
30:=:11.2, 5, = 15.8, 52 = 6.0 
To, = 0.60, гог = 0.32, т\ = —0.35 


We find from Eq. (7) that То1,2 = 0.80. The multiple correlation of X on X, 
and Х,, as given by Eq. (13.2.5), is ry,» = 0.82. The regression coefficients, 
from Eq. (13.1.7), are b, — 0.58, b; — 1.13. 

The sampling distribution of the partial correlation coefficient To1,2 is the 
Same as that of ro, (see $11.12) but with N — 1 instead of N and with Ро1,2 
instead of p. With p + 1 variates, Xo, Ху... Xp the density function for 
Го1,2..р has N — p + 1 instead of N. 


13.5 The Multivariate Normal Distribution The univariate normal distri- 
bution has the density function (when the variable is standardized so as to have 
mean zero and variance unity) 


(13.5.1) f(x) = Qn) !? exp(—x?/2) 


The importance and central position of this distribution in statistical theory 
have already been emphasized in earlier chapters. A corresponding position in 
multivariate theory is taken by the standardized multivariate normal distri- 
bution, with joint density function 


(13.5.2) f (xo, ху... Xp) = Qn) P+ |d(A)|*/? exp( — 9/2) 
Where 
(13.5.3) О = х'Ах 


Неге х’ is the row vector (Xo X4... Xp) and x is its transpose (a column 
Vector), while 4 is a symmetric positive definite matrix of p -- 1 rows and 
Columns. It is, in fact, the inverse of the correlation matrix P. 


l por +++ Pop 
(13.5.4) t= ру | Poy Ü sse Pip 
Pop Pip +++ Ppp 


For the bivariate case, in agreement with the customary notation, we will 
Write x, = x, x, = y, po, = p. Then 
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i = 
zi ‘| aza- ^ 
p 1 =p 1 


d(A) = (1 — p?)! 


1 p 
аю =] ^ Mp] o era 


x 


so that 


24 y?-2 
(13.5.5) f(x, у) -Qny'ü - py? вЫ 


X1 — p?) 
This is the bivariate normal distribution in standardized form. Tables of 
this function for selected values of p may be found in references [4] and [5]. А 


method has been given [6] for reducing the integral of the multivariate function 
to a bivariate integral and thereby obtaining numerical values. 


* 13.6 The Relation of the Multivariate and Multinomial Distributions We saw 
in Chapter 3 that with increasing sample size the binomial distribution with 
parameter 0 tends to a normal distribution with mean NO and variance 
МӨ(1 — 0). A similar result holds for the multinomial distribution (Appendix 
A.16 and 17). If the probability that a random item from the population belongs 
to the i" class is x; (i = 1,2... К), the observed frequencies f, in the various 
classes will tend, as the total sample size N(= УЛ) increases, to a multivariate 
normal distribution with means Nz, and variance-covariance matrix V, where 


m(1—7,  —mm; ...  —mnm, 
—пп mal = n) ... — 
(13.6.1) V=N UN Ы S E лу 
— пл, —UR, ... ™&(1—7,) 
Because of the fact that У л; = 1, we have d(V) = 0, which means that the 


matrix V is singular and cannot be inverted. However, we may omit one of the 
variates f; (say f,) and express it in terms of the remaining k — | variates by 
the relation f, = N — fi = fa —...—f, ,. These k 1 frequencies are 
multinormally distributed with means Nz, and variance-covariance matrix V*, 
which is just V with the last row and the last column omitted. The inverse 
of V* is 

(1,7! m7!) my 5 аад жы 


жү-1 "m Tk 
(13.6.2) (V*) =F У 


ГА =f = = 
™ л, en ба-а e m7!) 
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as thay be checked by multiplying it with V*, bearing in mind that 
л = | — Уң м. The quantity О of Eq. (13.5.3) which appears in the exponent 
of the multivariate normal distribution is, in this case, 


(13.6.3) Q = (f, — Nx(V*) (f; — Nx) 


where (f; — №,’ is a row vector of А — 1 elements and (f; — Nzj) is the 
corresponding column vector. On carrying out the multiplication, we find 


DSL Nay AS! (i МИ, – Nx) 
(13.6.4) (eid uh Nn, 


= + 
i=l Мп; Мт, 


_ & (л №)? 
i à Nm; 


И [6 ve 


As shown in Appendix A.17, О is approximately distributed as y? with k — 1 
degrees of freedom. This fact is the basis of the ordinary y? test for goodness 
of fit, already discussed in $ 10.3. 


* 13.2 Hotelling’s Generalization of Student's t  Hotelling [7] investigated the 
Properties of a statistic T which generalizes Student's ¢ for p variates. We recall 
that Student's z is the ratio of the difference between the sample mean and the 


Population mean to the estimated standard deviation of the sample mean, so 
that 


Ba N(X — и)? 
2 


(13.7.1) 

Hotelling's T is a standardized measure of the departure of all the p sample 
means from their population values. Let 
(13.7.2) Sij = S (Xia — X); — X) 
and let (S9) be the inverse of the matrix (Sij). Then Т is defined by 
(13.7.3) T? = ММ - 1) У S"(s, — wR, — и) 

s ij 
where М is the sample size. It is easily seen that when p = 1, T? reduces to 12. 
Or in this case, if x, — x, 
$11 =5 (x, — ¥)? =(М- 1)s,? 

so that 


sil = [ON - Ds*]"* 
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and 
М0)? _„ 


se 


T? 


This statistic 7, given by Eq. (3), may be used, just as ¢ is used, to test the 
nul! hypothesis that и; = Hio(i = 1,2... p) in the population, и; being the true 
value of the niean of X;, and иго an assumed value. On the assumption that the 
population is multivariate normal, with covariance matrix (c;;), the null hypo- 
thesis Но is rejected when T? > То?, where 


(13.7.4) T? = N(N — 1) У SX; сы Hio)(X; — Bjo) 
tj 


and where То? is chosen so that the probability that T? > T,?, when Но is 
true, is equal to some assigned value g. 

Hotelling showed that under Но the quantity и = T?/(N — 1) is a beta- 
prime variate with density function 


TT К 1 и(Ф-2)/2 

13.7. Se 

( ) u) af? T И x ay 
» 9 


This is equivalent to the statement that [T^(N — p)]/[(N — 1)p] has the 
ordinary F distribution with p and N — p degrees of freedom. When Ну is not 
true, and therefore д; — Hio is not zero, the distribution is non-central F, with 


the same degrees of freedom and with non-centrality parameter 2 = SE сі 
ij 


(ui — Hio)(Hj — Ию). The non-central F distribution has the density function 


5 N F p/2+ß-1 
a e+e) (n) 


pip ctae N —p 
q379 Л г ртм рур] 2 m NC EC 
ib 6) ( шут ) 


This reduces when 4 = 0 to the ordinary central form as in Eq. (8.15.5). Tables 
calculated by Tang (see $8 9.12 and 12.10) give the probability of accepting the 
null hypothesis when it is not true, for various values of 4 and for significance 
levels 0.05 and 0.01. His number of degrees of freedom /, is our p, his f; is 
our N — р, and his non-centrality parameter ф is related to our А by 


(13.7.7) (р+1)ф? 224 


Also the variate which Tang denotes by E? is ош T?/(T? + N — 1). This has 
the same distribution as the correlation ratio, which is the reason for Tang's 
notation. Further details of the T test and its optimum properties may be 
found in [8]. 


13.8 Discriminant Functions Suppose we wish to assign several individuals, 
on the basis of a measured variate X, to one or other of two populations 4 
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and B which differ in their means but whose distributions may overlap (Fig. 54). 
If the mean of A (д,) is greater than ihe mean of В (u5), we would naturally 
assign an individual with a high value of X to population А and one with а 
low X to population B. If the curves representing the density functions for the 
two populations intersect at Y = а, we might well take « as the dividing point. 
There will, of course, be a certain risk of mis-classification. The probability 
of classifying an individual who is really an А as belonging to B is F4(), where 
F(x) is the distribution function for population А. Similarly the probability 
of classifying a B as an A is 1 — F,(a). 


B A 


№ е By —X 


Fic. 54 DISCRIMINATION BETWEEN TWO POPULATIONS 


In practice we often have several variates Y,, Х,... X, which may be used 
for discrimination, and the problem then arises of choosing the best function 
Of these variates for discriminating with the least error between populations А 
and B. An example is the use of intelligence, aptitude and achievement tests of 
various kinds, along with high school records, for attempting to assess whether 
а student planning to enter a university is, or is not, likely to graduate in, 
Say, engineering. A student adviser would be glad to have available a suitable 
function of the various test scores to assist him in coming to a decision. A 
function of this kind is called a discriminant function. Individuals with a value 
9f the function greater than some fixed value « will be classed as A, those with 
а smaller value as В. 

Let us assume that we want a linear function of the measured хь Say 


(13.8.1) Ley bey Ps 2. sap 


and would like to choose the b; so that L will be as efficient as possible in dis- 
criminating between the two populations. Suppose that of N sample items 
available (on each of which p measurements are made and for each of which 
the Proper classification is known) there are N, from population А and N, 
from population B. 
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The measurements on variate X; from population А will be denoted by 
хш X = 1, 2... №, and those from population В by хр В = 1,2... №, 
and it is assumed that the two sets x,; and x5; have each a p-variate normal 
distribution, independent of each other, with means 41; и»; respectively and 
а common covariance matrix (с;;). A single later sample item gives the set of 
observations x; (i = 1, 2...p) and it is known that this item belongs to either 
A or B but it is not known to which. The discriminant function helps us to 
make the decision. 

The null hypothesis H, is that the new item belongs to А; the alternative 
hypothesis H; is that it belongs to В. By the Neyman-Pearson theorem (see 
$ 6.9) the most powerful critical region for testing H, against H, is given by 


(13.8.2) pis X2 . ig) «k 
р2(х1, X3... Xp) 
where p, and p; denote the joint probability density functions for бы оар) 


under H, and H, respectively, and where k is a constant determined by the 
size of the critical region. 
With the assumptions we have made, 


pi = (27) ""[q(g;]]- !? exp| -i Y ах, — n); — № 
(13.8.3) "d 


ра = (2n) ""?[d(s;)]- Y? exp| -1 У, o" (x, — ux; — Ж] 


ij 


where d(c;;) is the determinant of the matrix (ci. Then Eq. (2) is equivalent 
(on taking logs of both sides) to 


(13.8.4) уз c" [x; — paix; — и) — Gu = nx; — и] < 2log k 


However, the population parameters и, Hap 7;; are unknown, and we must 
replace them by estimators. The optimum estimators of Hy; and из; are 
(13.8.5) Xy NUI S хуш, Xu = №! S xa 

respectively, while that of c;; is 


S 


3.8.6) i= —— 4 НИҢ 
(1 "М+М 


where 
(13.8.7) 5; = $6654 — хи = х) + S (хо = Xa) xa jg — Xa) 
On substituting these estimates in Eq. (4) we obtain the test statistic 


(13.8.8) Е = 2 sU Lx; — X3)(x; — х) — (x; — х1)(х; — x17] 
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where (59) is the inverse of (s,;). Now К can be written 
R= у, six = X25) ХДи — X3) — ХХ + х] 
HJ * 
and because s” = 57, 
Y sx(u-x)- X s4x (X4; — Xa) 
ij ij 


Also the last two terms in the square bracket do not depend on x;. The test 
К < с is therefore equivalent to 


(13.8.9) L=9 sÜx(x, x) «k' 
ij 


Where А’ is a new constant, suitably chosen. Let us denote the difference between 
the two sample means for the variate X; Бу d; so that 


(13.8.10) d; = Xi; — Xz; 
Then 
E bx, 


with 
(13.8.11) b; — Y, s! d; 
j 


This function L is a discriminant function for assigning the new item to either 
A or B. If L < К, we shall assign it to В and if L > k’ to A. It is convenient 
to take K = зу, b; + X3). 

The same discriminant function is obtained by using a different approach 
(due to Fisher), The constants b; in the function L are chosen so as to make 
the sum of squares between populations for the given sample items as great 
as possible, relative to the sum of squares and products within populations. Of 
Course, it is only the ratios of the b; that matter, so a constant multiplier makes 
по essential difference. 

It may be noted that the discriminant function is related to Hotelling's 
Statistic Т. We can define а generalized statistic T, for the two-population 
Case by 


03.812) T; = Y SHEN (Eu — X965; — X) + Noi — X935; — x;)] 


ij 
Where 


(13.8.13) x, — Nut Nakai 
№, + №, 


Which is the combined sample mean for Y;, and s; jis given by Eqs. (6) and (7) 
above. The matrix (5%) is, as usual, the inverse of (s;). The quantity S;; is 
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the sum of squares and products within populations. By substituting from 
Eq. 13 in Eq. (12) we obtain, after a little reduction, 


(13.8.15) Tj? =v} s did; 
ij 


where v = (N,N;)/(N, + N;) and а, is given by Eq. (10). It is seen that, apart 
from the constant v, T;? is the same as L with d; in place of x;. 


EXAMPLE4 In a certain experiment (the details have been modified for 
convenience of presentation) 18 rabbits each received a high dose of insulin 
and 18 received a low dose. The blood-sugar was measured at 1, 2 and 3 hours 
after each dose. The three readings are denoted by ху, x; and хз. The aim is 
to find what linear combination of these readings would be expected to dis- 
criminate most effectively between a high dose and a low dose of insulin in 
another rabbit. 

The S.S. and S.P. within populations were as shown in the following table: 


Е os 
S.S. S.P. 
wie Xo? хз? Xixo X1X3 X2X3 


2677 2358 3223 1278 1814 1966 
—— ol 


and the values of d; (mean low-insulin value — mean high-insulin value) were 
7.594, 19.73 and 25.04, respectively. The matrix (S;;) is 


2677 1278 1814 
1278 2358 1966 
1814 1966 3223 


and its inverse (S) is proportional to 


3.735 — 0.553 — 1.765 
— 0.553 5.337 —2.947 
— 1.765 — 2.945 4.679 


The values of b, b2, b, are therefore proportional to 4,511 + 4,521 + q,S?!, 
4,51% + 4,522 + 4,532, and 4,513 + 4,523 + 4,533 respectively, that is, 
to —26.7, 27.4 and 45.6. A good approximation to the best discriminant 
would accordingly be 


L = —3x, 3x, + 5x, 
since 26.7, 27.4 and 45.6 are approximately in the ratio 3:3:5. 


ExAMPLE 5 (Nature, 168, 1951, p. 794). In order to discriminate between 
fossil skulls of men and chimpanzees, a discriminant function was calculated 
using four distinct measurements on the lower milk canine teeth. On the basis 
of 44 chimpanzee and 40 human teeth, the function obtained was 

+ 


№ 
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L = x, — 749x, + 24x, + 4.70x,. The average value of L turned out to 
be +17.6 for the chimpanzee teeth and — 5.0 for the human, with a standard 
deviation of 2.45. It was therefore concluded that if a tooth of unknown origin 
gave an L between 11.5 and 23.7 it might very reasonably be classed as a chim- 
panzee's, and if L lay between 1.1 and — 11.1 as human. For the famous Taungs 
Skull, of great antiquity, L turned out to be — 7.9 and for the Kromdraai skull 
— 2.6, so that both these are probably human. 


* 13.9 The Distance Between Two Populations The discriminant function and 
Hotelling's generalized Т, are both closely related to a measure of "generalized 
distance" between two populations, proposed by Mahalanobis. 

Suppose p variates Y; are measured on each sample item from each popu- 
lation. Let the population means of X; be и, and uz; and let 


X; =н + & 
Е Хы = р +E; 


Where the e; have a multivariate normal distribution (the same for both 
Populations) with means 0 and covariance matrix (о). If 


(13.9.1) 


(13.9.2) 9; = It; — Из 

the generalized distance is given by 

(13.9.3) A? = У 0,0, 
ij 


Where (оі) is the inverse of (о). A factor p^! is sometimes included on the 
right-hand side of Eq. (3) but this makes no essential difference. 

Since in practice we have to estimate the population means from the sample 
means, the formula for the observed distance becomes 


(13.9.4) D? =) o! d; а, 
ij 

where 

(13.9.5) d; = X, — X3 


The first mean is calculated from a sample of size N,, say, and the second 
from a sample of size N;. If сү, is not known, the estimator s;;, defined as in 
Eqs. (13.8.6) and (13.8.7) must be used. In this case, we speak of the studentized 
distance, D,, which is given by 


(13.9.6) D? —X s dd; 
hj 


and so is identical, apart from the constant v, with Hotelling's generalized Т,2, 
in Eq. (13.8.15). 

If the true Mahalanobis distance A is zero (so that the two populations are 
Teally identical), the quantity vD? has an ordinary X^ distribution with p d.f. 
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If A is not zero, vD? is distributed like non-central у? with parameter of non- 
centrality уД?/2. (see Appendix A.13). 
For the studentized distance, it was shown by Bose and Roy that if A = 0, 
the quantity 
2 
(13.9.7) palit earl Di 
P N, +N, -2 


has the ordinary F distribution with p and N, + №, — p — l d.f. ША is not 
zero, it has the non-central F distribution (see (13.7.6)) with non-centrality 
parameter vA?/2. 

For the bivariate tase, if we denote x, by x and X3 by y, and if the correlation 
between x and y in the population is measured by p, we have 


с,?, 
(в) = | C ey 


2 
рос, а, 


У 
(c?) = [e,?o,*(1 — p?)]7 | 


oy, "i 
so that D? becomes 


не 2 
200 9x 


CIE HEN TTE 
(13.9.8) D?=(1— ZI = 32)" (#1 = Fa)? 2965 — NG — ы] 

х бу 0,0, 
which, when p — 0, reduces to the square of the geometrical distance between 
the two sample means in the x-y plane (if standardized units for x and y are 
used). 


13.10 Stochastic Processes An important part of modern statistical theory 
is concerned with events which change in a random way as time goes on. 
An example is the position of a microscopic inert particle suspended in a fluid, 
and exhibiting the so-called Brownian motion. Another is the size of à 
population affected by births, deaths, immigration, emigration, etc. These 
and many other such linked chains of events, proceeding in time нна subject to 
random fluctuations, are called stochastic processes. 

The mathematical model of a stochastic Process is a variate X(t) depending 
on a parameter г which is usually (in physical applications) the time. Since t is 
continuous, the set of possible values of X(t) is non-countable, but in practice 
observations are taken at a finite set of values t(r=1,2.. 2 The corres- 
ponding values of X, say x,, xz . . . x... are the components of an n-dimensional 
vector, having a distribution function Pri Iu. t), so that Y is a 
multivariate random, or stochastic, variable. One of the simplest examples of a 
stochastic process is the “random walk” mentioned in §7.6. The mem step 
X, taken at time f, in either the positive or пера i 3 


: tive direction of the х-ах is 
independent of all the previous X, at times гү, г... 1-1. The cumulative sum 


13.10.1) 5.=Х,+Х, +... ux 
( r 
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is a stochastic process, representing the position of the individual taking the 
"walk" at time ¢,. Thus if 4 plays a set of gambling games with В, receiving one 
dollar from B each time he wins and paying one dollar to В each time he loses, 
and if the result of any game is independent of the preceding games, the total 
sum held by 4 at the end of r games, supposing he started off with a fixed sum 
m dollars, is a random walk process. Here Y, is always either +1 or — 1, and 
Sooner or later either 4 or his opponent (if similarly situated) will be ruined. 

Sequential binomial sampling is another, and more respectable, example, in 
Which each step represents the result of sampling one more item from the 
population (usually referred to as а "lot"). The step is in one direction ог 
another according as the result of the inspection is a "success" or a "failure." 
The total number of steps taken is a stochastic variable representing the total 
number of sample items required to arrive at one or other of two possible 
decisions regarding the population sampled (for instance, to accept the whole 
lot or to reject the whole lot). 


13.11 Markov Processes А stochastic process is said to be of the Markov 
type if the value of X at any time г, depends at most on the value at the 
immediately preceding available time t,-,. No earlier history of the process 
adds anything further to the probable future history of X. The joint probability 


for the observed set of values x}, x5 . . . x, is therefore given by 
(03.111) — P(x,, хо... Xq) = Р(х): Р(х›|х,)° Р(х, 


= P(x,) I] P(x,|x,-1) 


X3) «« Вох 


A Markov process is defined by the initial probability distribution P(x,) and 
the conditional distribution Pls 1) for any arbitrary choice of the times /,. 


It follows that 
X,-2) =| PC: 
(xe -1) 


(13.11.2) P(x, 

Where the integral stands for a sum over the possible values of Жал (f Ж 15 
discrete) or the ordinary integral over the whole domain of x,_, (if X is 
Continuous). This relation is known as the Chapman-Kolmogorov equation. 
By a repeated application of the equation we may obtain P(x,|x,) and then 


я 


(13.11.3) Р(х) «| P(x,|x,) PG) 
(x1) 


When the variate Х is discrete (as in many applications), the process is called 
а Markov chain, and a matrix notation is convenient. We suppose that at each 
time ¢,, Y can take one of a finite number л of possible values Khe Bis ou s 
Tepresenting n possible states of the system. The probability that Х changes 
from x, to x, between t,_, and ż, will be denoted by p;; (called a transition 


Probability). 1f we know the matrix P of transition probabilities, and the initial 
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value of X, we can calculate the probability for any possible value at any 
future time. " 
Note that in this matrix 


Рп Pnn 


the probabilities in each row (but not necessarily in each column) add up to 1. 
This is because in whatever state the system may be at time 1,1 it must be in 
one and only one of the n possible states at time г. However, since the transition 
probabilities refer only to transitions from x; to Xj a similar argument does 
not apply to the columns. 

For a Markov chain, Eqs. (2) and (3) become 


P(t,|t,..) = P? 
P(t,) = p'P^' 


(13.11.4) 


where P(t,) denotes the probability of the various states at time t, and p' isa 
row vector of the probabilities at 1, for the same n states. 


EXAMPLE 6 А system can be in one of two possible states. Initially the 
chance is the same for each, and at each transition the probability matrix is 


j 
SER. 


E 3 
213 4 


What are the probabilities for the two states after three steps? 
We have р’ = [1,1] 


ч 
Ш 
Ge he 
"m 
"v 
[3 
ll 
d^ E 
ck ela 
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"v 
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ll 
E ge 
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j d 
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RT 
nj 
3| 
ы 


3 
РР —-[i$$. 59] 


The required probabilities are therefore 185 and at 

If written as decimals, the rows of P are (0.426, 0.574] and [0.431, 0.570], 
which are nearly the same. A basic theorem on Markov chains states that if 
the matrix P is regular (that is, if some power of P has no zero elements) then, 
as n increases, P" tends to a matrix Q, each row of which consists of the same 
probability vector q’, with no zero element. Furthermore, whatever the initial 
probability vector p ^ pP" ^ q апа 4 isa unique fixed probability vector 
satisfying the relation 


(13.11.5) qP-q' 
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In the example above, the vector д is [3, +]. The interpretation of this 
theorem is that after a great many steps, the probability that the system is in a 
state x; is very nearly equal to the j™ element of q' regardless of the initial 
probabilities of the various states. 

Another theorem states that in a regular Markov chain, with transitions at 
unit time intervals, the average time it takes to return to a given state, having 
once been there, is the reciprocal of the limiting probability of being in that state. 


13.12 Stationary Processes А stationary process is one whose distribution 
function F,(x,, Хо... Х„; fis f2 . . . ty), for any set of times г, г... t, depends 
only on Шел — 1 intervals t, — t, (г = 2, 3... n) and so is independent of 
the absolute times. Translation along the time axis makes no difference to the 
probabilities in a stationary process. An example is a quality control chart 
where the variate concerned is satisfactorily “їп control." Also a Markov chain, 
after a considerable number of steps, has become practically stationary. 
Stationary processes represent a kind of stochastic equilibrium and are there- 
fore of importance in many practical situations, such as arise in problems of 
communication engineering. 

A process in which the mean is constant but the variance may change is said 
to be stationary to the first order. 

If the mean and variance are constant, as well as the covariance between 
values at a fixed interval apart, the process is stationary to the second order, 
and so on. 

Let us consider the case when all the observations are taken at regularly 
Spaced times, so that t, — 1,_, = T for all r. For a second-order stationary 
process 


(13.12.1) BU) =й 


ERX, — ШКХ, — 19] = e? (T) 


Where c? is the variance of any X, and p(T) is a function of the fixed interval T. 
This function is called the autocorrelation coefficient, and is the correlation 
Coefficient for pairs of values of X separated by the interval T. Since a multi- 
Variate normal distribution depends only on the first and second moments, it 
follows that a normal process which is stationary to the second order is also 
completely stationary. 

The simplest case of a stationary process is a sequence of independent observa- 
tions such as heads or tails in repeated tosses of a coin. Here р(Т) = 0 for 
id non-zero interval T. Another example is a /inear Markov process, defined 

y 


(13.12.2) X4, =А,Х, + Yeas 
Where У, +; is a sequence of uncorrelated variates independent of Х,, for all 5> 0. 


Since Е(Х,) = и for all r, we have from Eq. (2), 
и = ÀH bs 
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where р, з is the expectation of Y,,,. Then 
(13.12.3) пу; = ЩІ — A,) 


If we multiply Eq. (2) by X, and then take expectations, we get, on putting с? 


7 


for the variance of X,, and p, for the correlation coefficient between X, and X,,,, 
а?р, + и? = A6? + и?) + p- pys 


This gives, after substituting from Eq. (3), 
с?р, = 2,0? 

ог 

(13.12.4) А = p, 


Since the process is Markov, the partial correlation between, say, X, and X; 
(eliminating X5)iszero. Ву Eq.(13.4.7) thisimpliesthat p,, — P12P23 = 0, where 
Раз is the ordinary correlation coefficient for Y, and Хз. Since ру, and p23 
are both equal to p, (s being 1 in both cases), it follows that p,4 — pi, and, 
in general, 


(13.12.5) р, = ру" 


This relationship among the correlation coefficients is characteristic of the 
stationary linear Markov process. 

We cannot enter further into the topic of stochastic processes, with its many 
applications to economic time-series, population and genetic problems, com- 
munication theory and traffic engineering, cosmic ray showers and thermo- 
dynamics, to mention only a few of the directions in which the theory has been 
applied. Those who wish some further insight may consult reference [9] which 
contains a fairly full bibliography. See also [10]. 


PROBLEMS 
А. ($$ 13.1-13.4) 

1. Given that, for a group of children between the ages of 8 and 14, the ordinary 
coefficients of correlation between intelligence and School achievement between 
intelligence and age, and between school achievement and age, аге 0.80, 0.70 апа 0.60, 
respectively, calculate the correlation coefficient between intelligence and school 
achievement, eliminating the effect of age. 

2. In Problem A-3 of Chapter 12, calculate the three partial correlation coefficients 
and also the multiple correlation coefficient of Y on X and Z, Is this multiple correla- 
tion significantly different from zero? 

3. In calculating correlation coefficients between three variables, a student obtained 


e values roi = 0.6, roz 0.4, ris = 0.7. Is there good reason to suspect these 
Soest Why? Hint: Calculate ro,12 from the data. P 
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4. In Example 3 of $ 13.4, calculate the partial coefficients of correlation between 
Хо and Xs and between Xi and Xs. 

5. From the data of Problem A-5 of Chapter 12, calculate all the ordinary coeffi- 
cients of correlation, the partial coefficients of correlation between Y and each of 
Xi, Xo, and Xs, and the multiple correlation of Y on Xi, Хз and Xs. 

6. (Pearl and Surface) 1n a biometric study of egg production in the domestic fowl, 
measurements of length, breadth and weight (Xo, Аз, X», respectively) were made on 
453 eggs. From all these the value of roi,» was —0.8955. If the 42 eggs weighing from 
53 to 53.9 gm are considered alone, the ordinary coefficient of correlation ro: between 
length and breadth is —0.9117; similarly for the 46 eggs between 56 and 56.9 gm, 
ro = —0.8911, and for the 13 eggs between 62 and 62.9 gm, гоз = —0.8739. Show 
that the weighted mean of these values of ro: is nearly equal to ro:,» (compare $ 13.4, 
immediately before Example 3). 

7. If all the ordinary correlation coefficients in a set of p + 1 variates Xo, X1... X» 
are equal to r, show that the partial correlation coefficients ro1,2...p, Ро2,1...р, etc. are 
each equal to r/[1 + (p — Пн] апа that the multiple correlation of Xo on X1, Xs... X» is 
given by 

1+ pr 

+(p—1)r 

Hint: Show that the determinant d(R) is equal to (1 — r)?(1 + pr) and that Roo = 
(1 —)>-1[1 + (p — Dr], Коз = —r(1 — r7, etc. 

8. With three variates Xo, X1 and Хз, show that the correlation coefficient between 
the residuals xo,12 and x1.20 is equal and opposite to that between xo.» and х1.з. Hint: 
Xo.12 is the same as the va of $ 13.2, and x1.20 is the residual for the multiple regression 
of X: on Xs and Xo. 


В. (§§ 13.5-13.7) 

1. Write out the joint density function for the trivariate normal distribution, taking 
Xo = x, x1 = y, Xe = z and putting ро = рг», etc. The variables may be supposed all 
standardized. . 

2. Show that, if Х and Y are independent normal variates with zero means and 
Variances o ,?, ø у? respectively, the bivariate normal surface is cut by a plane through 
the z-axis in a curve for which the points of inflexion lie on the elliptic cylinder x?/ox? + 
У?/с у? = |, Hint: The equation of the plane is y = тх. The points of inflexion are 
given by d?z/dx? = 0, where = = f(x, y). 

3. If the variates Xi, Xs... Ху (— © < Xi < c) are independent and have а 
joint density function which is a function of xi? + xs? + ... + xy? only, show that the 
X: must be normal with mean zero and common variance. Hint: The functional 
equation f(x) f(y) = f(x + y) has the solution f(x) = ес. 

4. Let Xi, Хз have a joint bivariate normal distribution with means zero and 
Cu Ci? 
Con C22 
Hint: Леха) = f (x1, x2)/f (x1). When the variates are not standardized, the matrix 
А-1 of Eq. (13.5.4) is replaced by the covariance matrix. 

5. If Xi, X», Хз have a joint trivariate normal distribution, with covariance matrix 


Си Ci Cis 
K= | Cz Cm C23 
Csi Cs Css 
Calculate the expectation of X1, given X» and Xs. 


6. In the following table from Student's 1908 paper [11], x1 and хз represent 
additional hours of sleep obtained by the use of soporific drugs А and B respectively 


9n certain patients. 


1 — r%o,12...» = (1 — r) 1 


Covariance matrix С = ] Write out the density function for X», given Xi. 
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Patient xi хг 
1 1.9 07 
2 0.8 —1.6 
3 1.1 —0.2 
4 0.1 —1.2 
5 —0.1 —0.1 
6 4.4 3.4 
7 5.5 3.7 
8 1.6 0.8 
9 4.6 0.0 

10 3.4 2.0 


Assuming that each pair of observations of xı and Xe for a given patient is from a 
bivariate normal distribution, use the Hotelling T test to test the hypothesis at signifi- 
cance level 0.01 that neither drug really produces any soporific effect. (The following 
results of computation may be used: 


¥1 = 2.33, x2 = 0.75, 51 = 36.08, S22 = 28.80, 512 = 25.63 


The null hypothesis is that ило = p2 = 0). 
С. ($$ 13.8-13.9) 
1. Two treatments were applied to experimental forage plots, in 15 randomized 


blocks, each consisting of two plots, so that both treatments were used once in each 
block. The variable was the amount of Dutch clover in the forage stand, and this 


20.44, S22 = 6.41, 512 = 4.89. Calculate the best discriminant function of the form 


Course (1) Course (2) 
p 

N 111 257 

Xi 87.640 92.397 

Xe 31.081 56.074 

Xa 1.1586 1.2689 

S(x1 — xi)? 53136 194356 

S(x2 — xg)? 11616 15864 

5(хз — Хз)? 51.85 120.39 

S(x1 — xi)(xa — X2) 4863 17878 
S(xı — Xi)(xs — Хз) 485.5 1844.0 
S(x2 — X2)(xa — Хз) 243.8 836.6 


== == Á—— — Л € 
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Calculate the best discriminant function for distinguishing between the courses. If 
а new student comes along with the scores x1 = 80, x2 = 40, хз = 1.5, to which 
course should he be assigned? Hint: Calculate Li = У: bik: for course (1), and 
Із = Y Быль for course (2). If the observed L < ¿(Lı + Ls), the student should be 
assigned to course (1). 

3. В. А. Fisher [13] has discussed the separation of two species of iris, namely, (1) 
versicolor and (2) setosa. The criteria are X: (sepal length), X» (sepal width), Хз (petal 
length) and X; (petal width), all in centimeters. The data on 50 specimens of (1) and 
50 specimens of (2) are summarized as follows: 


5.936 5.006 
y= И ce = [m 
1.326 0.246 


19.1434 9.0356 9.7634 3.2394 
9.0356 11.8658 4.6232 2.4746 
9.7634 4.6232 12.2978 3.8794 
3.2394 2.4746 3.8794 2.4604 


98(su) = 


The inverse matrix is 


0.11872 — 0.06687 — 0.08162 0.03964 
йу = 98 —0.06687 0.14527 0.03341 —0.11075 
i= —0.08162 0.03341 0.21936 —0.27202 


0.03964 —0.11075 —0.27202 0.89455 


Calculate the discriminant function, and state the criterion Гог allotting another speci- 
men of iris to the one species or the other. 

4. Calculate Hotelling's 72° for the data of Problem C-3, and test its significance 
by the F test. Hint: On the null hypothesis that the two populations have the same 
vectors of means, (ш) and (0), the quantity T»? (Ni + № — p — 1)/[p(Mi + 
№ — 2)] has the F distribution with p and M + № — p — 1 d.f. 


D. ($$ 13.10-13.12) 

1. In Example 6 of $ 13.11, calculate P^, and hence obtain the probabilities for the 
two possible states of the system after four transitions. 

2. If the transition probabilities for a Markov chain are given by 


0 1 0 0 
1 0 0 0 
0 0 + $ 
g Q i i 


and the initial state has probabilities (3, $, №, №) what are the probabilities after one, 
two and three transitions? Why is there no limiting probability distribution in this 
example? 

3. Compute the fixed probability vector for the following matrices: 


0.1 0.9 : : Н 
0.6 0.41 ° 
4. Two urns, labelled (1) and (2), each contain л balls, either white or black. 


There are и black balls and л white balls altogether, but the compositions of the urns 
may differ. A transition consists in choosing a ball at random from each urn ànd 
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interchanging them, putting the ball from (1) into (2) and the ball from (2) into (1). 
Each state is completely specified by the number of black balls in urn number (1). 
If at any state of the process there are j black balls in urn (1), what are the transition 
probabilities ? с 

If initially j = 0 and n = 4, what are the probable compositions of the urns after 
three transitions? 


Show that the vector of probabilities g^ = (po, pi, рг... pn), where p; = (;) T ipd 
n 


is the fixed vector for this transition matrix. 

5. A Markov chain is said to be ergodic if it is possible to £o from every state to 
every other state. A regular chain is necessarily ergodic, since if the n™ power of P 
contains no zeros, there is a non-zero probability of every possible transition in n 
steps. Show that the сһаіп represented by the transition matrix 


= owo 
Om Om 


t Ore © 
Om Om 


is ergodic but not regular. 

6. Suppose that the following is an extract from a stationary process, the values 
being taken at successive times separated by the fixed interval T: —5, —6, —2, 4, 7, 
3, 1, —5, —1, 2. Estimate the auto-correlation coefficient by calculating the sample 
variance of the observations and the sample covariance between successive pairs. 
Hint: Take the variance as ГОРИ). To estimate V(X;) use the last nine 
observations, and to estimate V(X;-1) use the first nine. There are nine pairs in the 
expression for the covariance. 
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Appendix А 
MATHEMATICAL APPENDIX 


A.l The Limit of (1 + x/n)" for Fixed x, as n > оо 
By the binomial theorem, 


(лл) (eunt 1 ea (D) e EP C es Rm Y 


n 2! п! 


veh 6-96-25 


The (p + 1)" term in this expansion (the term involving 1/р!) is 


(А.1.2) 36-36-36 -4). psn 


This is always positive, and increases for fixed p as п increases (since the sub- 
tracted terms get smaller). Also the number of such terms in (1 + 1/7)" in- 
creases with л. For both reasons (1 + 1/л)" increases with л, so that it must 
either have a limit or tend to +оо. But we can show that it has a limit, as 
follows: 


1 2 n 
All the factors 1 — =, 1 — 2...1 = 
n n 


1 
are less than 1, so that 


i A 1 1 1 
(1+3) "TIL E 
n 2. Ы 


2! 3! n! 
1 1 1 
<1+1 +5 +5 + Ti 
1 


But the sum of this geometric progression (in parentheses) to infinity is 
ya — 4) = 2, so that (1 + 1/n)" < 3, however large п may be. Also, the sum 
18 obviously greater than 2 (which is the sum of the first two terms alone in Eq. 
1)). The limit is therefore a number between 2 and 3 which may be denoted by e. 
It is actually an irrational number, 2.71828 . . . . 
The expression (A.1.2), with p fixed, tends to the value 1/p! as и — co. The 
number e is therefore the sum 1 + 1 + 1/2! + 1/3! + ..., and this sum may 
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be calculated to any required degree of accuracy by taking enough terms. On an 
electronic digital computer it has been obtained to about 60,000 decimal places. 
If we define log x by the equation 
* dt 
(A.1.3) ю8х=| —, x>0 


id 


and define the exponential function exp x as the inverse of the logarithmic 
function, so that 


(A.1.4) x=expy if y-log x, 


then the following argument indicates that exp x is the same as the above 
number e raised to the power x, written е“. 


By Eq. (3), log(1 + xr) = [ 
with respect to t, (see $ A.9) 


1x du 


‚ 7 80 that, on differentiating the integral 
u 


d 1 а x 
— log(1 t) = ——-.—(1 = 
ТД la prr 
for t > 0. 
But by the definition of the derivative this relation merely states that 


lim log(1 + xt + xh) — log(1 + xt) __ % 
h-0 h Tixi 


and if we now let t > 0 from above, 
lim h^! log(1 + xh) =x 
һ-—0 
On writing ^ = 1/k, this becomes 
lim k log(1 + x/k) =x 
ko 
that is, 
lim log(1 + x/k} = x 
к со 
Непсе, Бу Еч. (4), 
(А.1.5) lim(1 + x/k)* = exp x 
ko 
If we suppose that А — со through integral values only, we have the result 
(A.1.6) lim(1 + x/n)" = exp x 


It follows from Eq. (6) and the earlier part of this section that exp 1 = e, so 
that log e = 1. The quantity e* for any real x is defined by log e* = x loge = x, 
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in accordance with the usual convention for indices, so that е“ = exp x and we 
obtain the required result, namely, 
(A.1.7) Jim(1 + x/n = e 


n-o 


A.2 Stirling's Approximation to п! 

Factorials are not very convenient for mathematical manipulation, and it is 
Often useful to replace n! by an approximation. The most common approxi- 
mation is Stirling's, namely, 


(А.2.1) n! а (2л) Ми" 1026" 
ог, equivalently, 
(А.2.2) log п! = 4 log(2z) + (п + plog n — n 


The meaning of the approximate equality here is that the ratio of the two sides 
tends to 1 аз п — со. The accuracy of the approximation may be gauged by the 
following examples: 

n=5, 5! = 120, 2z)!/2555e- 5 = 118.02 

п = 10, 10! = 3,628,800, (27)'/710'°Fe7 1° = 3,598,700 
The relative error is roughly 1/(12n), and therefore diminishes as и increases. 

We will establish Stirling's result in the following form: 

(4.2.3) log n! = (n + 3)log n — n + C? 
Where } + (4n)! < C, < 1 and then show that lim C, = 3log(2z). 


Consider the curve y — log x between x — 1 and x — n (Figure 55). The 
агеа under the curve is given by 


(A.2.4) а= | log х dx =nlogn—n+1 
1 


Fic. 55 STIRLING APPROXIMATION ТО N! 
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If the tops of the ordinates at x = 1, 2, 3, . . . are joined by chords, the area 
under the chords will be less than that under the curve, since the curve is every- 
where concave to the x-axis. This area is a sum of trapeziums, the area of the 
trapezium between k and k + 1 being J[log k + log(k + 1). The total area 
under the chords is 


З[@ов 1 + log 2) + (log 2 + log 3) +... + (log (n — 1) + log п)] 
= log 1 + log2 + 1083 + ... + logn — z(log 1 + log n) 
= logan! — +105 n. 
Since this is less than А, given by Eq. (4), we have the inequality 
(A.2.5) log n! < (n + 3)logn—n-41 
This establishes the upper bound on C, in Eq. (3). 
If we draw tangents to the curve at P(x — К) and Q(x = k + 1) and if the 


А ; d 
tangent at P meets the ordinate at О in В, the slope of PB is 1/k (since E log x 
dx 


= us) and therefore NB = MP + ИК = log + К. Similarly MA = log 


(k + 1) — I/(k + 1). The areas MA ОМ and MPBN are both greater than the 
area under the curve PQ between MP and NQ, and the mean of the two will 
also be greater. Since МАОМ = i[2log(k + 1) — I/(k + 1] and MPBN ' 
= iD logk + I/k], the mean of these two is [log k + log(k + 1)] + 4[1/k 
= I/(k + 1]. Summing for all k from 1 to n — 1, we see that the area under 
the curve is Jess than log n! — Mog n + 4(1 — 1/n). We have therefore from 
Eq. (4) the inequality 
(А.2.6) log n! > (п + Dlogn — n + i +2 
п 
which establishes the lower bound on C,. Clearly С, lies between i(n = 2) and 
1. Also as n increases C, decreases (since the difference between the area under 
the curve and that under the set of chords is 1 — C, and this difference increases 
with п). Hence C, must approach a limit C as n > оо. By using Wallis's 
formula (see Appendix A.8), namely, 


2?"(n!)? 
im —7——— = 
n>a nn) ^ 7 


we can evaluate C as + log (2л) = 0.9189. For, by Eq. (3), 


1/2 


(А.2.7) 


ftl n^ * V26-ngCs Я (2п)! = (2п)2"+ 1/267 2n9€3s 


so that Eq. (7) becomes 

22ng2n* 197 2n652C, 
lim Ei mes = п? 
п» о П (2п) е 2"еС2п 
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Since lim C, = lim C;, = C this becomes 


еб = (2x)! 
or 


C =4 log(2z) 


A.3 Improper Integrals 
If f(x) is a continuous function of x for x > a and if 


b 
(A.3.1) im | f(x) dx (=I) 
boo Ja 


exists, then the improper integral ee f(x) dx is said to converge and has the 


value /. If it does not converge, the integral either diverges to + oo ог — co, or 
oscillates. Similarly, if f(x) is continuous for x < b and if 


b 
(4.3.2) lim | f(x) dx (=k) 


exists, the integral L. f(x) dx converges and is equal to К. 

If both integrals |ы апа Li converge and have the values k and / respec- 
tively, then jud f(x) dx converges and has the value k + l. 

The С auchy principal value of the integral Im f(x) dx is given by 


(A.3.3) lim ў f(x) dx 


с-®/-с 


This limit may exist even though the two separate integrals do not. Thus 


x d. s cd: 
» II — 0 for all real values of c, but both integrals Г. x and 
9 хах „ | А a . 5 

E diverge, since the indefinite integral of f(x) is here $log(x^ + 1). 


n 18 
a X* +] 


The improper integral | ы therefore diverges, but its Cauchy principal 
— 


Value is zero. 

Another type of improper integral is that in which f(x) becomes infinite 
at some point or points of the range of integration; by splitting up the range 
into sub-intervals marked off by these points we need consider only the cases 
When f(x) becomes infinite at either the lower bound or the upper bound of 
integration, 

If f (х) 2 оо as x — a from above, but is otherwise continuous in the 
interval from а to A, and if 


(A.34) lim Í ^ род dx =I 
ate 


2» +0 
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then the integral | f(x) dx converges and has the value /. Similarly, if f (x) — co 
as x — А from below, but is otherwise continuous from a to A, and if 

A-e 
(A.3.5) lim F(x) dx =k 


e>+0 Ja 


then je f(x) dx converges and has the value k. 


If f(x) becomes infinite at a value x = c between x = a and x — b, we can 
define the Cauchy principal value of the integral, if it exists, by 


b c-t b 
(А.3.6) Í f(x) dx = lim (| f(x) dx + | f(x) dx) 
a e> +0 a cte 


A.4 Change of Variables in Integration 

It is often convenient, in order to perform an integration, to change the 
variables from one set to another. With a single variable the process is probably 
familiar to students who have had a course of calculus. If the variable is changed 
from x to и, where x = g(u), if g(u) is a differentiable function of u, and if f(x) is 
integrable from a to b, then 


b D 
(А.4.1) | f(x) dx -| f [g(u)]g'(u) du 


where g'(u) = d g(u)/du, a = g(x) and b = 9(В). Thus, 


w dx © cos u du [а п 
a 40102 Jo Ара Jo "7$ 
where the change of variable is expressed by x = sin и. 

Care is necessary in determining the new bounds of integration « and f to 
make sure that the interval from а to В for и does correspond to the interval 
from a to b for x, particularly when x is not a single-valued function of u. Thus, 
for example, if и = x?, either interval of x, from —2 to —1 or from 1 to 2, 
would correspond to the same interval of и from 1 to 4. In the first case, g' (u) is 
negative, and in the second, positive, but the bounds of integration for и are 
interchanged in the two cases. 

If there are two variables x and У, and the integral is a double one, we may 
need to change to a new pair of variables и and v, where x = g(u, v) and y = 


#(и, v). А double integral over a region R of the (x, у) plane is calculated by 
means of a repeated integral 


b Pg2(x) 
(A.4.2) [e y) dA -[ | f(x, у) dy dx 
ав 


1(x) 
The region of integration is considered as bounded by the curves у = g,(x) and 
y = g(x) between x = a and x = b. The corresponding region R in the (u, v) 
plane is bounded by the curves v = у, (и), v = y,(u), between и = x and u = В 
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(Figure 56). The element of area dy dx in the (x, y) plane becomes in the (u, v) 
plane 


(A.4.3) dA, — be) du dv 
u, 0 
where 
ôg 09 
ô д 
(А.4.4) J(=) = Р ‘ 
и, v, oh oh 
ди др 
_ 24.28 _ ёо ah 
< Qu Ov до ди 


Fic. 56 CHANGE OF VARIABLES IN INTEGRATION 


The functions g and h are supposed to possess continuous first partial derivatives 


A Я 
throughout the region R,, and it is also supposed that Л (=) does not vanish 


anywhere in R,. This function J is called the Jacobian of g and h with respect 
to u and v. The double integral (A.4.2) can then be written as the repeated 


integral 
B уи) h 
(A.4.5 2 f D "tu 
[foe | БЕ. ) з ; 


An example is the change from Cartesian coordinates x, y to polar co- 
ordinates r, 0, where x = g(r, 0) = r cos 9, y = №, 0) = rsin 0. Here 


dv du 


(27 _ 8g dh 0g 0h cos (т cos 0) – (—r sin 0) sin 0 =r 
г, 0 ôr 00 00 дг 
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If f(x, y) is integrable over the whole first quadrant in the (x, y) plane, we 
have 


ofo =(т/2 
(А.4.6) | | f(x, y) dy dx il f(r cos 0, r sin 0) r 40 dr 
о Јо о Јо 
since іп the first quadrant 0 ranges from 0 to z/2 and r from 0 to co. 
The equations x = g(u, t), y = h(u, v) can be solved to express и and v in 
terms of x and y. If the Jacobian does not vanish, the resulting functions, 
и = ф(х, y) and v = (x, y), will themselves be differentiable. We can then 


$,V 


calculate J (55). И is sometimes convenient to note that 


, 


P Л is 
(A.4.7) J(=) = =") 
и, v х,у 
since one of these Jacobians may be easier to calculate than the other. 


The above considerations may be extended to triple or n-dimensional 
integrals. 


A.5 The Gamma Function 
The improper convergent integral 


(A.5.1) T(n) -| xe dx, ЖО 
о 


is called the gamma function of n. Using the formula for integration by parts we 
easily obtain 


© 


(А.5.2) Г(п + 1) -| x"e^* dx 


о 
= [-#e>| +" | E ia тАИу 
0 
= пГ(п) 


since lim x"e^* = 0 for all > 0. 


xo 
If we put п = 1 in Eq. (1), we obtain I (1) = js ех dx = 1 so that Г(2) = 
1-T(1) = 1, Г(3) = 2-T(2) = 2, and in general for any positive integral value of n, 


(A.5.3) T(n + 1) = п(п– 1)...1= п! 


The gamma function may therefore be regarded as a generalized factorial, and 
indeed the notation п! is often used for the integral denoted here by T(n + 1) for 
any n > —1, whether integral or not. 

For n = 0, the integral of Eq. (1) diverges. 
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For negative values of п, except negative integers, Г(п) may be defined by 
means of Eq. (2), 


T(n) =} T(n + 1) 


Thus 
T(—4) = —2Г@%) 


The graph of T(n) is shown in Figure 57. The function is discontinuous at all 
negative integral values and at n = 0. 


Fic. 57 THE GAMMA FUNCTION 


An alternative form for the gamma function is obtained by a change of 
variable. If x = и?, we have by Eq. (A.4.1) 


(А.5.4) Г(п) = af е du 
0 


A.6 The Beta Function 
The definite integral 
1 
(A.6.1) B(m, n) -| x"-!(1— x)! dx, т> 0, п> 0 
о 
is called the beta function of т and п. Clearly, B(1, 1) = 1. If we write 1 — y 
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for x, we obtain 
0 1 

(А.6.2) В(т, п) = -| п-т" >| у Ча — yy"! dy 
1 о 


= В(п, т) 
so that, in the beta function, т and л may be interchanged at will. Alternative 


forms for the beta function may be obtained by change of variable. Thus, if 
x — sin? 6, 


n/2 

(A.6.3) B(m, п) = af sin?"- !0 cos?^-!9 40 
0 

and if x = (1 + y)! 

(A.64) B(m, п) = | y! yom" dy 
0 


An important relation between the beta function and the gamma function is 
the following: 
Г(т)Г(п) 
А.6.5 B(m, п) = —————— 
( ) V. n) Г(т + n) 
То prove this, we note from Eq. (A.5.4) that 


eo 


Г(т)Г(п) = af en tes ax | yie? dy 
o 


o 
This repeated integral may be written as a double integral: 


о [0 
af | xima y2n~d ota? y?) dy dx 
oJo 


and interpreted as an integral over the first quadrant of the (x, у) plane. Changing 
to polar coordinates (r, 0) and using Eq. (A.4.6), we find 


юрт? 
Г(т)Г(п) = af | (r cos 0)?" !(r sin 0)?" 1е-"» dO dr 
o Jo 


o z/2 
= af а сал ar | cos?"-! 0 sin?"~19 40 


0 0 
where the double integral is now written as a repeated integral. Using Eqs- 
(A.5.4) and (A.6.3), we obtain 
T(m): T(n) = Г(т + n): B(n, m) 
whence, by Eq. (2), 
Г(т)Г(и) 
Г(т + n) 


A.7 The Integral [e du and Related Integrals 
From Eq. (A.6.3), 


(A.7.1) BG, 3) = 2| 


о 


B(m, п) = 


=/2 
40 =п 
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But, by Eq. (A.6.5), 


ГР 
sa. = 2 - prr 
50 that 
(A.7.2) T) = п! 
Therefore, from Eq. (А.5.4), with n = $, 
(A.7.2) af e^" du = n"? 
0 
or, since ет“? is an even function of и, 
(A.7.3) e^" du = пі? 
Writing /2u = v, we obtain 
(А.7.4) | e^? dy = (21)? 
or dn 
(A.7.5) | фи dv=1,  4()- Qn) e^"? 


This expresses the fact that the total area under the standard normal curve is 1, 
as it must be if ф(р) is to be a probability density function. 
A useful related integral is 


(4.7.6) x. «| ve^? dv, = k21,2,3... 
о 

Putting v = \/2u, we get 

(Алл) I= 2k 2uře 72"? du 


0 


= 206+ »ar( 527) by Eq. (A.5.4) 


= з®-заг(® 1) 
Thus, 
ff) 
1/2 
ога) = 2379) = (5) 
(A.7.8) I, =2Г(2) = 2 


1, = 27) =2°?-4-4-T@) 


m 1/2 
= 3(5) 


and so on. 
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A.8 Wallis’s Formula 
D^ 
By Eq. (A.1.5), lim(1 - A = е, so that, for x > 0, 
n ` 


п о 


(А.8.1) ит | (1 -*) pi а= | et?! dt = T(x) 
о 


по) 0 
Putting t/n = и, we get 
1 
im | (1 = м)" (ип) іп du = T(x) 
п] 0 


and therefore, 


1 
(A.8.2) lim "| и (1 — uy du = T(x) 
п о 0 
The integral in Eq. (2) is B(x, n + 1) = Г(х)Г(п + D/T(n + x + 1), so that 
(A.8.3) Ба EED 


novel (n+x+1)_ 
Putting x = 4, and noting that 
Г(п+ 1) = п! 
апа 
T(n +3) = (п + 4)(n – 3n – 3)... rG) 
|.2n c 12n — 12п – 3 
2 2 2 
Е (2n + 1)!z!2 
© 2n(2n — 2)... (2)2"*2 


(2n + 1)!x!2 
Ее 2)2n*1 


ia AUT 


n! 
we obtain from Eq. (3) 


(A.8.4) 


n2 (a 1222281 
im ———~——_. = 
п-ю (2п + 1)!n1/? 
or 


8. li - 
(8.85) „о (an) int? 


This is Wallis’s formula. 


2?" (nt)? ..2m41 
im п 
2n 


n- o 


1/2 л!/? 


A.9 Differentiation Under the Sign of Integration 
Let f(x, 0) be a function of x depending on a parameter 0 and continuous 
over an interval of x between a and b, wherea and b are themselves differentiable 
functions of 0. Suppose also that the partial derivative Of[00 exists and is 
continuous over the same interval of x, for all admissible values of 0. Then if 
b(0) 


(А.9.1) 00) = | Лх, 0) dx 


а(0) 
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the derivative of /(0) with respect to 0 is given by 


dI(@) e Of db da 
А.9. — em dx — — f(a, 0) — 
(A.9.2) dB EC. Ла, ) 25 
This is known as Leibniz’s formula. A proof may be found in textbooks of 
advanced calculus, such as Sokolnikoff’s (McGraw-Hill, 1939), page 121. 
If a and b are independent of 0, the last two terms vanish and Eq. (2) becomes 
b 
a) [^8 ы 
40 00 
EXAMPLE Let f(x, 0) = e^ 7^, and let а = —(26)'?, b = (20). Then 
аа[а@ = —(20)- !/?, dbJd0 = (20)7'?, and 80790 = (x?/207)e~*"/?*. Also f(a, Ө) 
= f(b, 0) = e^!, so that if 


(A.9.3) 


a 


— (20)1/2 


(20)1/2 А 
100) -| e 349 d. 
we have 


(20)!/2 
E ach x!e7? P! dx + 207100)! 
40 20? J (зву 
А.10 Orthogonal Linear Transformations 
The linear transformation 
(A.10.1) a ee 


is called orthogonal if the constants си; satisfy the conditions 

1, when i =j 

(А.10.2) CuC; = : 
у us 0, when i # j 


If the determinant of the coefficients c;; is multiplied by itself, with rows and 
Columns transposed, (transposing does not alter its value), the result, using 
Eq. 2), is 


1 0..0 
0 1..0 
5 ; | ex 

0 0..1 
The value of the determinant is therefore +1, and it can be taken as 1 by 


Changing, if necessary, the sign of one of the Y;. Since 


(А.10.3) —=G; 
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this determinant is also the Jacobian of the Y's with respect to the ¥’s (see 
Appendix A.4). Therefore, 

(A.10.4) аў uadY,— dX... dX, 


It can be proved by using some matrix algebra (see $ A.22) that Eq. (2) 
implies 


І, when i=j 
(A.10.5) У сыск; = — 
From Eq. (1), by squaring, 
(А.10.6) Үү =У Хр + У сасХ,Х, 
and therefore, by Eq. (5), d е 
(A.10.7) У =У x? 
Ў 


The orthogonal transformation with determinant 1 is equivalent geometrically 
to a rotation of the coordinate axes about the origin. Such a rotation, of course, 
leaves the distance of any point from the origin unchanged, and this is the 
meaning of Eq. (7). 


A.11 Angle Brackets and k-Statistics 
As in § 5.8 we define 


S, Xe Nee) 
i=1 
ММ — 1)<pq> = У, ХХ Я 
МОМ — 1)(N — 2)<pqr> = Y' XXX, 
ijk 


and so on, where «he symbol >” indicates that in every term of the sum the 
subscripts i, j . . . are to be different. 
Then 


s?-(xx)(rx) 
i d 
=} X? + у, XX, 
i ij 
where we have separated the terms of the product in which i — Jj from the terms 


in which i and j are different. Since Y; Y? = S,, and Yt XX; = ММ — 1) 
(115, we have 


(A.11.1) S,? = 8, + ММ — 111» 
Again, 
sex (5х) 


= У хе + D» Xx 
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from which we get 

(A.11.2) $152 = S, + ММ — 1)<12) 

Other results given in Eq. (5.8.2) may be found similarly. Also, 


N(N — 1)8,<11) = (x x) (Y xx) 


: In the terms of this product there will be some in which i = j, some in which 
i = k, and some in which i, j, А are all different. Therefore, 


ММ - DS,.CH» = Y! X? X, Y XX + У, XXX, 
= 2N(N — 112» + N(N — 1)(N — 2.111» 
Therefore, 
(4.11.3) (N — 2«111» = 5,11) — 2412) 


Similar arguments will give the other results of Eq. (5.8.3). The checks 
Eq. (5.8.6) are straightforward algebraic relations derived from the definitions 
Of the k’s and the above properties of the brackets. Thus the first one states that 


«пу = (05? — №162) — 1D) 


or 


Nap -(0 -N"Q 
-NUCSQ-NOS, 


Which is equivalent to Eq. (A.11.1). 


A.12 Bernoulli Numbers and Sheppard's Corrections | : 
The »'^ Bernoulli number is defined as the coefficient of t’/r! in the expansion 


Of (e! — 1)-!. Therefore, 


co t = af t Е 1) 
(А.12.1) Bc: -(6-1 =3 (com 5 
Where coth x = (e* + е7*)/(е* — e^?) 

The first few of these numbers are 
1 1 1 


В, =-, Ва = -z 86 = 2 


1 
B, = 1, В, = —5 7 


5 


All the B’s with odd subscript, except Вл, are zero. . 
In the grouping of a distribution into classes with class-interval c, any value 
X of the variate is recorded as the nearest class-mark X;. The difference between 


X, and X is not greater numerically than с/2. That is, 
(4.12.2) X,=Xt+e 
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where = may be supposed to have a uniform (rectangular) distribution on the 
interval — c/2 to c/2. This, of course, is not usually true and is an assumption for 
reasons of mathematical convenience, but if the grouping is reasonably fine 
(intervals short compared with the effective range) it is not likely to be very far 
out. If K(k), КА) and K,(h) are the cumulant generating functions for XY, X; 
and e respectively, 


(А.12.3) ККВ) = K(h) + K,(h) 


by the main property of such functions, § 2.12. 

Now ККИ) is the c.g.f. for the grouped distribution and will give the uncor- 
rected cumulants, while K(f) is the c.g.f. for the true distribution and will give 
the true cumulants. Also, by $2.10, Example 4, 


(A.12.4) K,(h) = log M,(h) = log sinh a — log 


= log sinh(t/2) — log(t/2) 


where t = ch. 
Differentiating with respect to г, we have 


dK,(h) 1 t 1 
А. 12:5 EN A Ferris E adm 
( ) dt z coth(5) 
But, by Eq. (1), 
t t oo à 
5 (eon; - 1) 21-5683 2B 


so that, on dividing by г, 


(A.12.6) 1 com (5) чү ы 
6 9 2| t ^" m 
From Eqs. (5) and (6), therefore, 

— h ә Iu 
240 -l5— 
2 


from which, after integration, we obtain 
= t 
Kh) 2 C+ LB 
Since К,(А) > 0 as t > 0 (t = ch), the constant C is Zero, and therefore, 


со 1" 
K(h) = Kh) — У В, — 
^ dd » rr! 
This is equivalent to 


‚В, 
(A.12.7) (к). 0, т=2,3... 
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In practice, the corrections are usually applied to the sample k-statistics, 
rather than to the cumulants, since the latter are seldom known except insofar as 
they are estimated by the former. 
А.13 The Non-Central Chi-Square Distribution 

The quantity x with probability density, 

CE NE. (2х)" 

(A.13.1) f(x) = ех le н + КР) 
is said to have the non-central chi-square distribution with А degrees of freedom 
and with parameter of non-centrality 7. When À = 0 it reduces to the ordinary 
chi-square distribution, the first term of the series being interpreted as 1/T(/2), 
and x taken as y?/2. 

If X,, Х,,... X, are independent normal variates, with unit variance and 
With expectations jt, и»... fly, and if Но is the null hypothesis which specifies 
the values of the ju; as д, 43° . . . 4°, then an unbiased test (see § 8.10) of the 
hypothesis Но is provided by the rule: reject Ho if 


(A.13.2) S = uy. > y, (a) 
i=1 


Where y,?() is the tabular value of central y^ with К degrees of freedom, 
Corresponding to the significance level o, i.e., 


(А.13.3) 2I uf le" "duca 
Г(К/2) J xx? 


If Hy is not true, but instead д; is not equal to ji? for at least one value of i, 
(hypothesis 77 ,) the quantity on the left-hand side of (2) follows the non-central 
Chi-square distribution with А degrees of freedom and parameter 


k 
(А 13.4) jut © -= що)? 
i=1 


The power function of this test is 
(А.13.5) рд = РКУ (X; — и? > xi 04H33 


E pem ЕЕ gie raro gy 
na "2. т!Г(т + alus 

In the tables prepared by Miss Evelyn Fix (reference [5] of Chapter 6) the 
Quantity tabulated is the value of 4 for certain assigned values of a and P(A), and 
for k = 1(1)20(2)40(5)60(10)100. (The А of the tables is twice the 2 defined 
above), 
4.14 Some Theorems on Conditional Probability | Е 

Let X be a random variable which takes the value x; with probability p;(Y), 
Subject to the occurrence of the event Y. That is, 
(А.14.1) РКУ) = Р(Х = У), i= 1.2.1; 

= P{(X = x) 0 YPF) 
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Then the conditional expectation of X, given Y, is defined as 
(А.14.2) E(X|Y) = Y p(Y)x, 
i-1 


This is the same definition as for the ordinary expectation, except that 
conditional probabilities are used. 
If Y; is the event that the random variable Y takes the value Уһ 


Р(Х = x and Y = у) 
P(Y = yj) 


THEOREM 1 The expected value of X is equal to the expectation of the conditional 
expectation of X, given Y. In symbols, 


(А.14.4) E(X) = E[E(X|Y)] 
=) p;E(X|Y) 
where p; is the probability that Y — p. Proof: 
E(X|Y) = Y. хр(Ү) 
by the definition of conditional expectation. Also, by Eq. (3), 
P(X =x, Y= yj) 


(A.14.3) P(Y) = 


РКУ) = 
d 


Therefore, the right-hand side of Eq. (4) is Y; Y; xjP(X = x, Y = yj But the 
sum over j covers all the possible values of Y, and some value must occur with 
every X; consequently, 


У.Р = х, Y = уу) = Р(Х = х) 
and the right-hand side of Eq. (4) reduces to 
У жР(Х = xj) which is E(X) 
The conditional variance of X given Y is defined by 
(А.14.5) V(X|Y) = E([X — E(X|Y)P|Y} 
with a similar definition for conditional covariance. 


THEOREM 2 The variance of X can be regarded as made up of two parts, the 
expectation of the conditional variance and the variance of the conditional expec- 
tation. Symbolically, 


(A.14.6) V(X) = E[V(X|Y)] + V[E(X|Y)] 
Proof: we may write X — E(X) аз X — E(X|Y) + E(X|Y) — E(X), so that 


(А.14.7) [X – EWP = [X — ЕЖУ + 2[x — Е(Х|Үу][Е(Х|Ү) — Е(Х)] 
+ [EX|Y) - ЕСО] 2 
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The variance of X is the expectation of this expression. By Eq. (4), 
E[X — E(X|Y)]? = E(E[X — E(X|Y)P|Y) 
= E[V(X|Y)], by Eq.(5) 
Also, since E(X) — Е[Е(Х| Ү)], the expectation of the last term of Eq. (7) is 


V[E(X|Y)]. It only remains to show that the expectation of the middle term of 
Eq. (7) is zero. This middle term is 


(А148) _ 2(XE(X|Y) — ХЕХ) + E(X) E(X|Y) – [Е(Х|Ү)]?} 


Now E[XE(X|¥)] = ELECX| Y) E(X|Y)] by Eq. (4), so that the expectations of 
the first and last terms of (8) cancel. Similarly, 

E[XE(X)] = E[E(X| Y): E(X)] 
So that the expectations of the two middle terms in (8) cancel. The variance of 
X is thereforé the sum of expectations of the first and last terms of Eq. (7) and 
this gives Eq. (6). 
A.15 Extrema of a Function of Several Variables Connected by Given Relations 


For the sake of definiteness, let us think of a function of three variables, 
FG, у, 2), for which we want a maximum or minimum subject to the given 


relation ф(х, у, 2) = 0. 
This relation may be regarded as expressing z in terms of x and y. The 


conditions for an extremum of f are 
of дд 
of 48. f oz _ 
Ox дт ox 
of д 
of PECA yf д: 
ду 02 ду 


(А.15.1) 
=0 


д : 
and the partial derivatives a > are connected by the equations 
х oy 


0$ дф az 


=0 
2х | dz Ox 


(A.15.2) 


Eliminating these partial derivatives from Eqs. (1) and (2), we get 
ub Fab a 


(A.15.3) af ob af ab _ 
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which may be written as Jacobians (see § А.4) 


(A.15.4) (24) =0, (2°) =0 


These, together with ф = 0, determine the values of x, y and 2 corresponding 
to extrema. Equations (3) express the conditions that we can find a function 4 to 
satisfy the three equations 


of дф of дф 

+0 =0, 2,496. 

mrt Бы 
(A.15.5) dile: 

ай 

02 02 


so that we may replace Eqs. (3) or (4) by the set (5), in which 4 is an unknown 
auxiliary function. This set is given by equating to zero the partial derivatives 
of the function f + 2$ with respect to the variables x, y, z where 2 is regarded 
for the purpose of this differentiation as à constant. The quantity is called an 
undetermined multiplier, and the method is due originally to Lagrange. 

The method may be extended to n variables connected by л relations. The 
extrema of f(x, хз... x,), subject to the conditions ф, = 0, ф, = 0... фи = 0, 
are found by equating to zero the n partial derivatives of the function, 


+ $: + 45$. +... + APs, 


the undetermined multipliers 2, .. . 2, being regarded as constants. The actual 
values of these multipliers do not matter. 


A.16 The Multinomial Theorem 

The multinomial theorem gives the expansion of (хі ox +... +)" 
where we suppose that n is a Positive integer. Each term in the expansion is 
formed by picking an x; from the set x,, Х2...Хь doing this n times and 
multiplying the results together. If a-particular x; is picked п; times, each term 
in the product is of the form Хх"... ху”, where Ух п, = n. The number of 
terms like this which are identical in value is the number of ways of arranging 
n, objects of one kind (x,), n, of another kind (x5), and so on, and by Theorem 


1.11 this number is п\/(пү!п,\... ny!). We have, therefore, 
n! 
.16.1 Me Bob. Е ИН п 
iA ) (xı á ^) Yu. i d 


where the sum is taken over all sets of values ofn, n. 
such that È n, = n. 

As an illustration, if n = 3 and k = 3, the possible sets of values of 
(т, na, na) are (3, 0,0), (0, 3, 0), (0, 0, 3), (2, 1, 0), (2, 0, D, (1, 2, 0), (1, 0, 2), 
(0, 1, 2), (0, 2, 1) and (1, 1, 1), so that (x, + x, + х3)? = x + х,3 + x, + 
3(x1 7X2 + хү?хз xix! + xixa? + X2X37 + х,2х.) + 6(хүхух). 


- . пу (all non-negative) 
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The terms in the sum on the right-hand side of Eq. (1) can be interpreted 
as the probabilities that a random sample of n objects, drawn from a popu- 
lation which is divided into k classes will have exactly л; in the first class, 
n, in the second class and so on. It is assumed that, in the population, the 
probability that an object falls in the i'" class is ху, where of course x; > 0 
and Уш x, = 1. 

Such a distribution of п objects among k classes is called a multinomial 
distribution. The binomial distribution is the special case k = 2. 


A.17 The Multinomial Distribution and Chi-Square 
The probability of the particular multinomial distribution with f; objects in 
the i^ class (i = 1, 2... К) is, from (A.16.1), 


(A.17.1) 


where У f, = N and where л; is the probability that any object in the population 
belongs to the i class. Therefore 


(А.17.2) log p = log №! — log fi! + УЛ log m; 


Using Stirling's approximation (Appendix A.2), on the assumption that the 
Л. are all sufficiently large, we can replace log f;! by (f; + plog f; — Л + 3 108 
‚ Ол), and thus obtain 


(А.17.3) log p = (N + 3)log N — N + $ 108(2п) 
- XU. Plog si + ZJ; — 5 log2n) + DSi log л 


= Hog N + лв) А og) - à Y, log 


Since the term N log N may be written >, f; log N. We therefore have 


Nu; 
(A.17.4) log p= log C + 07+ ов) 
i i 
where 
(А.17.5) log C 2 — k=l log(2xN) — $ b» log л; 


If we put $; = Nz; (which is the expected number in the 1% class in a sample 
of size N) and let 


(A.17.6) z; —-(fi— 92/0; ^ 
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we have 
(A177) log p— log C=; + О 


-Xi- Dlog(1 + ф; 122) 

= у, (6; + à; ?z, + 3)08(1 + $; 1/22). 

Now И the ¢, are fairly large, the differences between the f, and the ф, will 
usually be small compared with the ф; themselves, so that we would expect 
h:~ 1221 to be less than 1. If therefore we expand the logarithm in Eq. (7) in a 
series, namely, 


log(1 + ф; 122) = prz- 36,127? +... 
and multiply by the preceding factor, we obtain 
(A.17.8) log p — log C = — У [522 + 26,1? + O(h 12], 
But У, zip”? = Yi: — ф) = 0, so that, to order ,~ 1/2, 


log p log С = – 4 F 2,2 
or, equivalently, 


(А.17.9) p=Ce HU, — C= QgN)-&- DAT диз 
i 


This means that the 2; are approximately normally distributed about zero 
with unit variance, They are not, however, independent, since they are subject 
to a linear constraint 


(A.17.10) Y 26) =0 


We have seen ip § 4.6 that the sum of squares of п independent standard 
normal variates is distributed as X! with n degrees of freedom. If we make an 
orthogonal transformation (8 A.10) to new variables У» Where y; = У, Сн 
and let C,; = ф1/?, the variable Ук Will be zero by Eq. (10) and we shall have 
k — 1 independent variates, all normally distributed about zero with unit 
variance. Also У, 22 = Di"! y?, so that > 22 is the sum of squares of k — 1 
independent standard normal variates and is therefore distributed as y? with 
k-—1df 

It may be shown that if the variates are connected by / linear constraints, 
> 2 is approximately у? with k — I d.f. This sum, by Eq. (6), may be written 


2 yiz №)? 
^ у Nm, 


and this is the ordinary definition of Zs? for a variate grouped into classes. 


A.18 Matrix Algebra 


A set of mn elements arranged ina rectangular array of m rows and n columns 
is called a matrix of order m by n. When m = n the matrix is said to be square. 
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The elements may be real or complex numbers, but in statistical applications 
they are usually real. The whole array is often denoted by a single letter, or by a 
typical element enclosed in parentheses. Thus 


0311 012 «O1, 
021 022 - « - A2n 

A-|- + ` = (ai) 
Amı 22 · · · Amn 


The matrix is thought of as a single mathematical entity, which is subject to 
algebraic operations. These operations must, of course, be defined. 

A matrix with a single row or a single column is called a vector. 

A matrix is zero if.and only if all its elements are zero. 

Two matrices are conformable if they each have the same number of rows 


and also the same number of columns. | 
Equality. Two conformable matrices are equal if and only if each element in 


one is equal to the corresponding element in the other. 
Addition. The sum of two conformable matrices А and B is a conformable 


matrix C such that 
(А.18.1) си = aij + bij 


That is, elements in corresponding positions in the two matrices are simply 
added. Subtraction is similarly defined. The matrix denoted by — A has all its 


elements opposite in sign to those of A. 
Multiplication of a matrix and a number. If А is any number (real or complex) 


the product ДА is defined by 
(4.18.2) АА = (да) 


Each element in А is multiplied by A. Е 
Multiplication. If A is an m by p matrix and B a p by n matrix, the product 


AB is an m by n matrix defined by 
Р 
(A.18.3) AB = (Zaun) 


For this product to exist it is necessary that the number of rows in B should be 
equal to the number of columns in А. Each row in А is multiplied, term by term, 


With each column in В. 


EXAMPLE 1 
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Here 3(1) + 18) + 4(3) = 23, and so for the other elements of the product. 

Matrix addition satisfies the ordinary commutative and associative laws of 
elementary algebra, but this is not true of multiplication. If the matrices in 
Example 1 above are multiplied in the reverse order we get an entirely different 
product: 


PO a o. 4 [BS 3 
2 | 4 ME з о 42 
-i Z ğ 34 


It is not therefore true in general that matrix multiplication is commutative. It 
is necessary to distinguish between “pre-multiplication” and “post-multipli- 
cation,” or to speak of multiplying “оп the left” ог “оп the right." If we 
multiply B by A on the left, we get AB, and if on the right, BA. 

Another law of elementary algebra that does not hold with matrices is the 
product law. This asserts that if ab = 0, then either a or b must be zero. But АВ 
can be a zero matrix without either А or В being zero. 


2 -1 1 3 0 о 
10 -s} [2 6} lo o 
On the other hand, the associative and distributive laws of algebra hold for 


matrices, provided of course that the matrices are properly conformable for the 
operations suggested. With this understanding we can write 


EXAMPLE 2 


(А.18.4) (АВ)С = А(ВС) = АВС 
А(В + C) = AB + AC 
(А + B)C = AC + BC 


А.19 Transposition 


If the successive columns of a matrix А are written as successive rows of a 
new matrix A’, then А’ is called the transpose of A. If A is an m by n matrix, A’ 


is an n by m matrix, and а"; = аң. 


EXAMPLE 3 
3 6 2 
а i б | 3 2 5 I 
A= 2 " " sA = 16 1 9 0 
i б ё 2 0 7 6 


The transpose of a row vector is a column vector, and conversely. 


(А.19) MATHEMATICAL APPENDIX 405 
A square matrix is symmetric if А’ = A and skew-symmetric if А' = — А. 

2 я ; " В 
Тһиѕ М J is symmetric and & el is skew-symmetric. In any skew- 


symmetric matrix all the elements along the principal diagonal must be zero. 
The following theorem is sometimes referred to as the reversal rule: 


(A.19.1) (AB) = B'A' 
Proof: If c';; is the element in the i™ row and j™ column of (ABY, 
спу = can = » абы 
= X аба = Y Ба"; 
k k 
which is the (i, /)"" element of B'A’. Similarly, (ABC) = C'B'A', etc. 
A square matrix in which all the elements except those in the principal 


diagonal are zero is called a diagonal matrix. A diagonal matrix is symmetric, 
and commutes with any other diagonal matrix having the same number of rows. 


EXAMPLE 4 
з о 0 =i 0 0 -3 0 0 
0 1 0 0 -4 | - 0 — 0 
0 0 2 0 0 3 0 0 6 
-1 0 0 3 0 0 
= 0 —4 0 0 1 0 


0 0 3 0 0 2 


Multiplication of a matrix by a number À is equivalent to multiplying (either 
on the left or on the right) by a diagonal matrix with each non-zero element 
equal to 4. 


EXAMPLE 5 

4 0 0 2 1 7 я à м 17 
0 a о. 9 3 -2|= |92 34 -24| =4]9 3 -2 
0 0 4 1 4 -5 д 44 —5À 1 4 —5 


А diagonal matrix with each diagonal element 1 is called a unit matrix. Multi- 
Plication by a unit matrix, on the right or on the left, leaves any other matrix 
unchanged. 

(А.19.2) AI-IA- A 


Where / is a unit matrix with the proper number of rows and columns. 
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А.20 The Determinant of a Matrix 

If A is a square n x n matrix, the determinant of А, denoted by d(A), is a 
polynomial of the n'^ degree in the elements of А. The terms of the polynomial 
are obtained by multiplying together all possible sets of n elements taken one 
from each row and one from each column, and giving the products alternate 
plus and minus signs. It is assumed that the reader is familiar with the elementary 
properties of determinants as found in most text-books of college algebra, but we 
recall briefly a few of these properties. 

The usual notation for a determinant is 


011... 045 
(A.20.1) (4) =|. © = [а 
Фр ае 


The determinant of п — 1 rows obtained by omitting the і" row and j® 


column of d(A) is called the minor of a; j and will be denoted by d(A,,). The 
signed minor 


(A.20.2) Ci; = (-D'*/ q(4,) 


is called the cofactor of a; j. The formulas for the d 


pi f evelopment of d(A), in terms 
of the і" row and in terms of the j'^ column, are 


(A.20.3) 44) = У аус = У ас, 
7 i 


If in these formulas we replace the cofactors of the i" row (or the j column) 


by the cofactors of a different row or column (what Aitken has called alien 
cofactors) the expressions reduce to zero. That is, 


È аС,; = 0, ТЕК 
(А.20.4) з 
У аус, =0,. јак 


A convenient symbol for expressing such 
and (4) is the Kronecker delta, 6 jks defin 
equal to 0 when j # k. Equations (3) 
written: 


Pairs of relations as Eqs. (3) 
ed as equal to 1 when i= К, and 
and (4), with this notation, may be 


Y а С; = д, (А) 
(A.20.5) 
X aijCik = 5, аА) 


(А.21) MATHEMATICAL APPENDIX 407 


Even if a matrix is not square, we can form determinants by crossing out 
rows and/or columns to leave square arrays. The determinants of these arrays 
are all determinants of the matrix. The rank of a matrix is the order of the 
largest non-zero determinant that can be formed in this way. А square n by n 
matrix is called singular if the determinant of the whole matrix is zero. Its rank 
is then of course less than л. 

If 4 and B are square matrices of the same size, and if C = AB, then 


d(C) = 44): d(B) 


Although the matrices AB and BA are in general different they have the same 
determinant. 


A.21 The Inverse of a Matrix 
If an n by т matrix A is non-singular, there е 
denoted by 4~!, such that 


xists a unique л by n matrix 


(A.21.1 AA 5 = ATA =Т 
) 


This matrix A47! is called the inverse of A. 
The transpose of the matrix of cofactors of t 
adjoint of A, denoted by adj А. That is 


(4.21.2) adj А = (С) = (Cj) 


he elements of А is called the 


It follows that 
(A.21,3) mr (x ась) = (бу dA) 
k 


But since d(A) is a number and бу is the (i, 7)" element of the unit matrix 7, we 
Сап express the matrix on the right of Eq. (3) as d(A)I. We have, then, 


adjA — 


=] 
4 d(A) 


(A.21.4) 


$0 that the inverse of А is the adjoint of 4 divided by its determinant (supposed 
non-zero), We can therefore define, for non-singular square mames the 
Operations of “pre-division” and “post-division,” symbolised by 4~*B and 
ВА-! 


The reversal rule applies to inversion, namely, 


(A.21,5) (AB! = B^ 47 


To prove this, we note that 


(AB(AB) ! =1= AA! 2A(BB ЭАТ! 
= (AB)(B™ 1477) 
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where we have used the associative law for matrix multiplication. 
The operations of transposition and inversion are commutative. That is, 


(A.21.6) (471) = (A)! 
Proof: АЧА) = (АА) = Г = I, which shows that (471), is the inverse 
of A'. » 

The elements of A^! are conveniently written as a", using superscripts instead 
of subscripts. From Eqs. (2) and (4) it follows that 


„ s 
ИЛ 
(A.21.7) inr 


Given the set of normal equations in $ 12.1, namely, Ab = д, we can solve 
for b by multiplying both sides on the left by д-!. This gives b = A~'g, or, 
using the notation of Eq. (7), 


(A.21.8) b, = У, ag, = (x Cia) [aw 
P $ 


Since У; Сл; is the expanded form of d(A) in which the i'" column has been 
replaced by a column of g’s, this equation is a statement of Cramer’s rule 


(§ 12.3). 


A.22 Orthogonal Matrices 


A non-singular square matrix А is orthogonal if its transpose is equal to its 
inverse, that is, if 


(A.22.1) АА’ =I 


The matrix of the coefficients of an orthogonal transformation ($ A.10) is 


orthogonal. Thus the transformation expressed by Eqs. (А.10.1) and (A.10.2) 
can be written in matrix form 


(A.22.2) Yecx, CC =I 
where Y and X are column vectors. An example is the transformation 
М = X, cos 0 — X, sin 0 
Y; = X, sin 0 + X, cos 0 
which corresponds to a clockwise rotation of the axes ab 


cos 9 — sin 0 


out the origin through 
an angle 0. The matrix E: D^ cos 1 is orthogonal. 


A.23 Calculation of the Inverse of a Matrix 


The solution of a set of normal equations ($ 12.1) is immediately expressible 
in terms of the inverse matrix of the coefficients. Moreover, some of the elements 
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of the inverse matrix are required for testing the significance of the partial 
regression coefficients Б. It is therefore sometimes worth while to invert a 
matrix. In practice the matrix is usually symmetric. 

With a square matrix of three rows and columns, it is a simple matter to 
compute the cofactors and the whole determinant, and thus obtain the inverted 
matrix directly from its definition ($ A.21). However, for larger matrices a more 
pue ny and systematic method is desirable. The following method is known as 

ordans. 


If we can find a matrix J such that 
(А.23.1) КА, D) = (I, J) 


then J is 471, Here Z is a unit matrix of the same number of rows as 4, placed 
alongside 4, The method consists in multiplying the augmented matrix (A, Г) 
by successive matrices of the form / + Уь where J; differs from a zero matrix 
only in the і" column. By suitably choosing the non-zero elements of J; we 
can build up a unit matrix Z on the left of the product matrix and the rest 
of the product matrix is then the required inverse, J. Fora 4 x 4 matrix we 
Should have 


(+ ЛА, D = (41, Ky) 
(1+ JaAn K,)= (A2; K2) 
(12,4, Кә) = (4 Ks) 
(п + ЛА» К) = (, 47) 


is chosen so that the first column of 4, has elements 


The first col 
е d not be recorded. This first column of 


(read downwards) 1, 0, 0, 0, and so nee 
ig J, becomes the first column of K, and is recorded there. The second column 


9f 1+ J, is chosen so that the second column of A, becomes 0, 1, 0, 0, 
and the second column of K; becomes this second column of 7 + Jz, and so 
Оп. In this way the unit matrix Z is built up in four steps, but need not be 
Tecordeg. 

If the ele rst column of 4 
those of Casca Mak = I + J, are lay, —@;[й, —аз а, —a,/a,. If the 
elements of the second column of A, are а'\, а, 3s 44 then those of the 
Second column of J + J; are аа» 1а", =а%[@ э —a' ala s. The other 


Products are formed similarly. 


(read downwards) аге аи, а, аз, A4, 


EXAMPLE 6 The steps in the calculation of the inverse ofa 4x4 а өп 
Bven below. At each step the matrix / + J, has been shown in full for thesake o 
clearness, but it is not actually necessary to set it down. Unit columns in the 
augmented matrix have, however, been omitted. The pivotal elements in the key 
Columns of 4 Ay, Аз, and Аз are marked. Thus a, = 1.0, а’. = i 
= 0.7381, a", = 0.5903. These are the elements whose reciprocals are used in 


rming the corresponding columns of / + Ji. 
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+h A 1 
ae — Ч PW ^ 
10 0 0 0 1.0 0.4 0.5 0.6 , = Е 
=04 1 0 0 04 10 03 0.4 И 
—0.5 0 1 0 05 03 10 0.2 à 
—0.6 0 0 1 0.6 04 02 1.0 : i 
1 —0.4762 0 0 . 0.40 0.50 0.60 1.00 ; | 
O 1.1905 0 0 . 0.84 0.10 0.16 —0.40 à ‘ 
0 —0.1190 1 0 . 0.0 0.5 0.0 —0.50 б у | 
0 —0.1905 0 1 . 0.16 —0.10 664 0.60 б 3 
го —0.6129 0 .  . 0.4524 0.5238 1.1905 —0.4762 3 | 
p^ 1 —0.1612 0 -> 0.1190 0.1905 —0.4762 1.1905 е 
оо 1.3548 0 -= 0.7381 —0.1190 —0.4524 —0.1190 А 
о о 0.1612 1 » . —0.1190 0.6095 —0.5238 —0.1905 à . 
1 0 0 — 1.0108 3 0.5967 1.4678 —0.4033 — 0.6129 А 
© 1 0 —0.3552 ||. . . 0.2097 —0.4033 1.2097 —0.1612 " 
оо 1 0.2731 |. . -  —0.1612 —0.6129 —0.1612 1.3548 . 
оо 0 1.6940 1. . z 0.5903 —0.5967 —0.2097 0.1612 » 
| 2.0709 —0.1913 —0.7758 — 1.0108 
—0.1913 1.2842 —0.2185 —0.3552 
—0.7758 —0.2185 1.3988 0.2731 


—_ 


— 1.0108 —0.3552 0.2731 1.6940 
<L L 
1 А-1 


Since the original matrix А was symmetric, the inverse 4~! is also symmetric. 


It is therefore unnecessary to compute separately the terms of 47! below the 
main diagonal, except as checks on the arithmetic. 


А.24 Solution of a Set of Normal Equations by the Square Root Method 

The given equation, 4b = д, can be solved if we can find a triangular matrix 
S (that is, a square matrix with all the elements below, or above, the principal 
diagonal equal to zero) such that 
(A.24.1) S'S=A 
If so, we have ' 


S'(Sb) =g 
which is equivalent to the two matrix equations 
(A.24.2) Sb — k, Sk=g 


Since S and S' are triangular, these equations 


are comparatively easy to solve. 
The first step is to find the elements of Sb 


y solving the equations, equivalent 


to Eq: (1), 
RIT — 4, 
511512 = Ач» 
шо Sse 
(А.24.3) \ $122 + $222 = а; 


(А.24) MATHEMATICAL APPENDIX 411 


These equations are solved in order, beginning with the first. 
The second step is to find the elements of k by solving S’k = д, which, when 
written out, is: | 


Suki = gi 
Siok, + S22k2 = 92 


(А.24.4) 
S, ky + Sapko +... + 5Кр = dp 
The final step is to solve the set equivalent to Sb — K, namely, 


5,101 + 51265 +... + 5150р = К 

5,0 +... + Sy b = К 

А | wem а ndis 
Sp-npeibp-ict Sp-1.pbp = Кр-1 

Sppbp = Кр 


These are solved backwards, beginning with the last equation, obtaining first 
by, then b and so on. The process may be illustrated by the following 


example: 


р-1? 
58b, + 23b, + 43b, = 160 
23b, + 786, + 168b, = 1910 
43b, + 168b, + 10966. = 240 


The various steps, including a check column, may be set out in a table: 


TABLE А.24 
g g 
58 23 43 160 284 
A 23 78 168 1910 2179 
43 168 1096 240 1547 
k k 
7.616 3.020 5.646 21.008 37.290 
5 8.299 18.189 222.503 248.991 
27.079 — 144.972 — 117.893 
ь | —8557 38.545 —5.354 
b^ | —7.557 39.545 —4.354 
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Step 1 S,, = (58)? = 7.616 


23 
—-—— = 3.020 
Si2 7.616 
43 
= >> = 5.646 
51 77 616 
S22 = [78 — (3.020)?]!/? = (68.88)"/? = 8.299 


168 — (3.020)(5.646) 
= 8.299 


S33 = [1096 — (5.646)?— (18.189)2] 1/2 = 27.079 


S23 


= 18.189 


160 
Step 2 k, =-——— = 21.008 


_ 1910 — (3.020)(21.008) 


"T 2020 = 222.503 

„-29- =” Сайан )_ 144972 
Stp3 b, - 5927 = —5,354 

p, = 222.503 — measg —— 

p, 21008 — maeman SEHA Sa — 


The check consists in computing a vector g whose elements are the sums of the 
corresponding rows in 4 and g. Thus 


Я: = 58 + 23 + 43 + 160 = 284 


Steps 2 and 3 are repeated with д instead of g, giving new vectors k and b. Apart 
from rounding-off errors in the last decimal place, we should find 


Е, = 5+ 5,41 Heet Sk, 
and 
b; — b, 1. 


In practice, each row of 5 in Table A.24 is completed, and the check applied, 
before the next row is started. 


Appendix B — TABLES 


TABLE В.1 RANDOM SAMPLING NuMBERS* 


First Thousand 
1— 5-8 9-12 13-16 17-20 21-24 25-28 29-32 33-36 37-40 
. 2315 7548 590: 8372 5993 7624 9708 8695 2303 6744 
о5 54 5550 4310 5374 3508 9061 1837 4410 9622 1343 
14 87 1603 5032 40 43 6223 5005 1003 2211 54 38 0854 
3897 6749 5194 0517 5853 7880 5901 9432 4287 1695 
9731 2617 1899 7553 0870 9425 1258 41 54 8821 05 13 
3393 08 72 3279 7331 1822 6470 68 so 
4336 1288 5911 or64 5623 9300 9004 99 43 6407 4036 
93 80 6204 7838 2680 4491 5575 11 89 3258 4755 2571 
49 54 or31 8108 42 98 4187 6953 82 96 6177 7380 952 
3676 8726 3337 9482 1569 4195 9686 7045 27 48 38 80 


11 0709 2523 9224 6271 2607 0655 8453 44 67 3384 5320 
12 4331 0010 8144 8638 0307 5255 51 61 4889 7429 4647 
13 6157 0063 6006 1736 3775 6314 8951 2335 0174 6993 
I4 3135 2837 9910 7791 8941 3157 97 64 4862 35848 6919 
79 565 4635 0653 2254 


72 
17 9795 5350 18 40 8948 8329 522 o825 2122 5326 1587 
18 9373 2595 7043 7819 8885 5667 1668 2695 99 64 45 69 
19 7262 1112 2500 9226 32 64 3566 6594 3471 68 75 18 67 
37 7378 6699 5361 93 78 


со c с Gd GS M 
EJ 
> 
ю 
a 
o 
© 
со 
4 
+ 


o5 
22 8916 097I 9222 232 0637 3505 5454 89 88 4381 63 
23 2596 6882 2062 8717 92 65 0282 3528 6284 9195 48 83 
24 8144 3317 1905 0495 4806 7469 0075 6765 o1 71 6545 
25 1132 2549 31 42 3623 43 86 08 62 4976 6742 2452 3245 
Second Thousand 
I-4 5-8 9-12 13-16 17-20 21-24 25-28 29-32 33-36 37-49 
6475 5838 8584 1222 5920 1769 6156 55 
1030 2522 8977 4363 4439 38 11 2490 67 
уз от 7984 95 51 3085 0374 6659 1028 8753 7656 9149 


боо: 25 56 0588 4103 4879 7965 5901 69 78 8000 
3733 09 46 5649 1614 2802 4827 4547 5544 5536 5090 


I 
2 
3 
4 
à 
6 47 86 9870 озі 5911 2273 бо б2 6128 22 34 
7 
8 
9 
то 


3804 0427 3764 1678 9578 3932 3493 2488 4343 
og o8 83 05 48 oo 78 3666 9302 95 56 4604 53 36 
84 o6 10 4324 2062 8373 1932 35 64 39 69 


9759 1995 4936 6303 5106 62 06 9929 7595 3295 77 34 


II 7401 2319 5559 7909 6982 66 22 4240 1596 74 90 75 89 
12 5675 42 64 5713 3510 5014 999 
13 4980 0499 08 54 83 12 19 98 08 52 82 63 7292 92 36 50 26 


14 8 4896 4724 8785 6670 0022 
di we 4 3574 2836 3673 0588 7229 


16 4850 2690 5565 3225 3144 
17 9676 5546 92 36 3168 6230 4829 63 83 
18 3892 3615 5080 3578 1784 2344 41 24 6333 


I 8 16 25 2250 5587 5107 
AE d pu o264 1850 6465 79 64 81 70 


41 05 
22 4746 0604 7956 2304 8417 1437 
23 4785 6560 8851 9928 2439 4064 4171 79 13 46 31 
29 86 2018 1037 5765 1562 98 69 0756 
5718 8791 0754 22 22 


i issi dall and 
*Reproduced with the permission of Professor E. S. Pearson from M. G. Kent 
B. Babington Smith, Tables of Random Sampling Numbers (Tracts for Computers, No. 24), 


Cambridge Univ. Press. 413 
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TABLE В.І (cont.) 
RANDOM SAMPLING NUMBERS 


Third Thousand 
I-4 5-8 9-12 13-16 17-20 21-24 25-28 29-32 33-36 37-40 
8922 1023 6265 7877 4733 5127 2302 1392 4413 9651 
0400 5998 1863 ог 82 go 32 9401 2423 6301 2611 0650 
98 54 6380 6650 8567 sois 4064 5228 41 53 
4171 9844 0159 2260 1314 5458 1403 9849 9886 5579 
2873 3724 8900 7852 5843 2401: 3497 9785 5678 4471 


Оо бм © GG NM 
a 
a 
N 
= 
w 
оо 
о 
© 
N 
RI 
E 

oN 
M 
е 
N 
o 
w 
о 
oc 
e 
со 
о 
о 
+ 
N 
N 
E 
e 
© 
a 
e 
со 


Fourth Thousand 
Ij-16 17-20 21-24 
9443 9364 0423 
6278 3701 09 25 
69 92 40 79 05 40 
48 12 35 36 04 88 


43 07 0722 8652 


3759 3431 4320 
5°59 7746 34 66 
5618 01 46 9313 
4081 4754 5179 
4977 озот зо то 


33 93 3660 4275 
5172 6590 44 43 
7487 8803 38 33 
5710 2875 21 82 
6079 2553 2900 
4196 8610 49 12 
6672 7645 4632 
29 94 09 74 42 39 
5936 1995 7986 
3177 8710 7382 
4795 7017 59 33 
3115 0053 25 36 
26 5I 3050 ут ori 
6014 6377 5993 
72 79 5773 72 36 


29 93 
15 II 
оз 87 
79 39 
3o өз 
29 03 
78 19 
15 84 
9 36 61i 
I0 40 54 
Ir 4087 
12 1022 
T3 1591 
14 1340 
T5 66 52 


16 9r 66 
17 67 41 
18 76 52 
19 19 81 
20 25 50 


55 90 
92 47 
18 63 
89 67 
62 98 


SNA Ca CNN 


ююььь 


vid C ьм 


I-4 
27 50 
02 31 
37 43 
83 56 
06 81 


39 15 
84 45 
82 47 
98 04 
18 33 
Ir 

33 92 
тг 48 66 
13 85 85 
14 08 27 
T5 59 61 


I 

17 x a 
18 48 08 
19 7627 
20 98 89 
2I 88 
d» d. 
?3 o8 86 
?4 33 81 


хоча GA GS MN 


TABLE В.1 (cont.) 
RANDOM SAMPLING NUMBERS 


13-16 
17 55 
89 07 
58 98 
75 64 
19 65 


75 97 
10 74 
44 29 
19 49 
46 54 
28 бо 
17 33 
62 42 
90 99 
9o 89 


34 31 
20 77 
72 43 
34 81 
38 47 
64 63 
79 42 
31 99 
об 39 
59 51 


TABLES 


Fifth Thousand 


17-20 
25 79 
77 87 
76 29 
52 69 
44 28 
98 o2 
97 77 
13 5I 
72 09 
38 62 
99 82 
14 68 
59 28 
44 04 
02 71 
78 70 
37 29 
34 48 
7147 
25 75 
16 09 
24 82 
76 19 
20 07 
74 27 


Sixth Thousand 


13-16 
63 85 
41 09 
77 33 
35 42 
12 42 


73 16 
15 83 
47 13 
69 02 
48 50 
43 40 
об 80 
31 80 
96 57 
96 75 


36 54 
95 95 
41 15 
72 15 
53 55 
61 88 
от 35 
43 85 
бо 12 
07 o8 


17-20 
87 бо 
66 отг 
63 26 
92 12 
92 42 
48 74 
84 20 
92 85 
65 42 
15 64 
27 72 
29 09 
то 19 
33 12 
17 94 
92 85 
39 75 
73 96 
oo 25 
o7 98 


58 79 
91 16 
5.1 20 
32 44 
66 92 


21-24 
35 55 
69 88 
53 99 
37 14 
05 96 
50 27 
57 42 
бо 12 


55 33 
58 26 


79 74 
50 31 
56 65 
от 77 
51 o8 


65 60 
92 48 
32 55 
2154 
66 71 
35 65 
18 36 
65 18 
o8 12 
53 81 


33-36 
42 82 
94 24 
12 76 
28 42 
72 18 


15 72 
82 09 
5391 
o2 18 
89 47 
58 71 
до 80 
13 77 
15 59 
64 об 
27 25 
91 76 
88 79 
19 45 
73 95 
66 66 
49 1I 
63 28 
45 03 
86 o5 


33-36 
47 00 
76 59 
94 29 
98 57 
45 49 


73 16 
15 90 
42 38 
73 23 
72 13 
58 71 
28 89 
55 12 
71 12 
8o 48 
11 20 
19 56 
88 77 
58 50 
56 96 
69 86 
68 36 
48 10 
79 17 
65 33 


415 


37-40 
13 63 
81 11 
5022 
29 60 
15 94 


43 73 
49 56 
од 86 
oo 64 
41 8o 


66 58 
66 60 
16 14 
83 35 
89 09 
06 20 
96 99 
06 17 
44 07 
72 22 
66 92 
72 64 
86 59 
33 33 
73 00 


37-40 
50 92 
o2 58 
53 94 
12 52 
18 16 


39 90 
7947 
8737 
57 26 
48 62 


56 99 
97 79 
26 34 
17 69 
59 92 
96 63 
об 67 
17 03 
57 66 
41 78 
79 47 
85 o6 
68 97 
97 22 
57 69 
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TABLE В.1 (cont.) 
RANDOM SAMPLING NUMBERS 


Seventh Thousand 


1-4 5-8 9-12 13-16 17-20 21-24 25-28 29-32 33-36 37-40 
Зо зо 23 64 6796 2133 3690 0391 69'33 9013 3448 o2 19 
6:29 8961 3208 1262 2608 4200 3173 3130 3061 34 11 
2333 6101 0221 1181 5132 3610 2374 5031 доІІ 7352 
9421 3292 9350 7267 2320 7459 3030 4866 7532 2797 
87 бт 9269 ог бо 2879 7476 8606 3929 7385 0327 5057 


3756 1918 0342 8603 8574 4481 8645 7116 1352 3556 
6486 6631 5504 8840 1030 8438 0613 58 83 6204 63 52 
2269 5845 4923 0981 9884 0504 7599 2770 7279 3219 
2322 1422 6490 1026 7423 5391 2773 7819 9243 6810 
42 38 5964 7296 4657 8967 2281 9456 6984 1831 0639 


17 18 0134 1098 3748 9386 8859 6953 78 86 3726 85 48 
5897 2933 2919 5094 8057 3199 389: 
4318 1142 5619 4844 4502 8429 oi 78 6577 7684 8885 
5944 0645 6855 1665 66 13 3800 9576 5067 6765 18 83 
9150 3432 3800 3757 4782 6659 1950 8714 3559 7947 


79 14 6035 4795 9071 3103 8537 3870 3416 6455 6649 
от 566 6368 8026 1497 2388 5922 8239 7o 83 4834 4648 
2576 1871 2925 1551 92 96 oror 2818 0335 1110 27 84 
2352 1083 4506 4985 3545 8408 8113 5257 2123 6702 
20 9164 0864 2574 1610 9731 10 27 2448 8906 4281 2910 


21 8086 0727 2670 08 65 8520 31 23 28 99 39 63 3203 7191 
22 3171 3760 9560 9495 5445 2797 0367 3054 8604 12 41 
23 0583 5036 09 04 3915 66 55 8036 3971 2410 6222 21 53 
24 9870 0290 3063 6259 26 04 97 20 ооо: 28 80 4023 09 91 
25 8279 3545 6453 9324 8655 4872 18 57 9579 2009 31 46 


Eighth Thousand 

1-4 5-8 9-12 13-16 тў-20 21-24 25-28 29-32 33-36 37-40 
I 3752 4955 4065 2761 08 50 9123 2618 950 98 20 99 52 
2 4816 6965 6902 08 83 08 83 6837 оо 96 pe 12 16 1793 
3 5943 0659 5653 зобі 402I 29 06 4960 до 38 2143 1925 
4 8931 6279 4573 7172 7711 2880 72 35 75 77 2472 98 43 
5 6329 9061 8639 0738 38 85 7706 1023 3084 0795 3076 
6 
7 
8 
9 


KANN Ыыыы ы 
хоча икон ою 0X с UAWDH 

ч 

© 

^ 

un 

a 

© 

tn 

w 

о 

+ 

со 

© 


7168 9394 0872 3627 8589 4059 8337 93 8 840 
05 об 9663 58 24 0595 5664 7753 85 64 15 M. 55 HH 59 03 
9335 5895 4644 2570 3166 oros 4444 6291 3631 45 04 
1304 5767 7477 5335 9351 8283 2738 63 16 0448 7523 
I0 4996 4394 5604 0279 5578 or44 75 26 8554 o1 81 32 82 


II 2436 2408 4477 5707 5441 0456 0944 3058 2 6 
12 5519 9720 ©0111 4745 7979 0672 12 81 86 97 sane ob 33 
13 0228 5460 2835 3294 36 74 5163 9690 o4 13 3043 IO I4 
14 9050 1378 2220 3756 9795 4995 9115 5273 1293 7894 
15 3371 3243 2958 4738 3996 6751 6447 49 91 6458 9307 
16 7058 2849 5432 9770 2781 64 69 7152 oz 

17 0968 9610 5778 8500 8981 98 30 9 76 LH i Bs eje 
18 1936 6085 3504 1287 8388 6654 3200 30 20 o530 42 63 
I9 0475 4449 6426 5146 8050 5391 0055 6736 68 0 G8 29 
20 7983 3239 4677 5683 422: 6003 1447 0701 6685 4922 


21 8099 4243 0858 5441 9805 5439 3442 

га 4883 6499 8694 4878 7920 6223 1643 07 47 3835 5242 
23 2845 3585 2220 1301 7396 7005 8450 68 59 96 58 16 63 
24 5207 6315 8230 6623 1426 6661 1780 4197 4o 27 24 80 
25 3914 5218 3587 4855 4881 osi: 26 99 03/80 0856 zák 
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TABLE B.2 


ORDINATES AND AREAS OF THE NORMAL CURVE, $(2) = = е-12 
Ы 2 


27 


ч 


2 H(z) |./:Ф(2)а2 lz) |Л+@аг| z ole) |Jeé(z)ds 


-00 .39894 | .00000 || .45 | .36053 | .17364 .90 | .26609 | .31594 
.01 730892 | 00399 | .46 | .35889 | .17724 | .91 | .26369 .31859 
.02 “39886 | .00798 | .47 | .35723 | .18082 .92 | .26129 | .32121 
.03 139876 | .01197 | .48 | .35553 | .18439 .93 | .25888 | .32381 
.04 “39862 | .01595 || .49 | .35381 | .18793 | .94 .25647 | .32639 


.05 ‚39844 | .01994 || .50 | .35207 | .19146 | .95 .25406 | .32894 
-06 "39822 | 02392 | .51 | .35029 | .19497 | .96 .25164 | .33147 
.07 “39797 | .02790 || .52 | .34849 | .19847 | .97 |. 24923 | .33398 
.08 “39767 | .03188 || .53 | .34667 | .20194 | .98 |. 24681 | .33646 
.09 39733 | .03586 || .54 | .34482 | .20540 | .99 .24439 | .33891 


00 | .24197 | .34134 
01 | .23955 | .34375 
2 | .23713 | .34614 
.03 | .23471 | .34850 

04 | .23230 | .35083 


.22988 | .35314 
.06 | .22747 | .35543 
.07 | .22506 | .35769 
.08 | .22265 | .35993 
.09 | .22025 | .36214 


.10 .39695 | .03983 || .55 | .34294 | .20884 
«E ‘39654 | .04380 | .56 | .34105 | .21226 
.12 “39608 | .04776 || .57 | .33912 | .21566 
.13 ‘39559 | .05172 | .58 | -33718 | .21904 
.14 “39505 | .05507 | .59 | .33521 | .22240 


.15 ‚39448 | .05962 || .60 | .33322 | .22575 
.16 “39387 | .00356 || .61 | .33121 | .22907 
.17 39322 | .06749 | .62 32918 | .23237 
.18 "39253 | .07142 || .63 | .32713 .23565 
.19 “39181 | .07535 | .64 | .32506 .23891 


Ree ee Bee 
e 
a 


.10 | .21785 | .36433 
.11 | .21546 | .36650 

12 | .21307 | .36864 
13 | .21069 | .37076 
.14 | .20831 | .37286 


.15 | .20594 | .37493 
16 | .20357 | .37698 
.17 | .20121 | .37900 
.18 | .19886 | .38100 
19 | .19652 | .38298 


20 | .19419 | .38493 
21 | .19186 | .38686 
.18954 | .38877 
23 | .18724 | .39065 
24 | .18494 | .39251 


25 | .18265 | .39435 
.26 | .18037 | .39617 
.27 | .17810 | .39796 
.98 | .17585 | .39973 

29 | .17360 | .40147 


30 | .17137 | .40320 
.31 | .16915 | .40490 
32 | .16694 | .40658 
33 | .16474 | .40824 
34 | .16256 | .40988 


.20 .39104 | .07926 | .65 | .32297 .24215 
.21 39024 | .08317 || .66 | .32086 |. 24537 
.22 738940 | .08706 || .67 | .31874 .24857 
.23 “38853 | .09095 | .68 | .31659 | . 25175 
.24 :38762 | .09483 | .69 | -31443 |. 25490 


.25 .38667 | .09871 || .70 | .31225 | . 25804 
.26 38568 | .10257 | .71 | .31006 .26115 
.27 “38466 | .10642 || .72 | -30785 |. 26424 
.28 738361 | .11026 | .73 | .30563 |. 26730 
.29 738251 | .11409 | .74 | .30339 |. 27035 


.30 ‚38139 | .11791 || .75 | -30114 |. 27337 
.31 “38023 | .12172 | .76 | .29887 . 27637 
.32 “37903 | .12582 | .77 | -29659 |. 27935 
.33 “37780 | .12930 | .78 | -29431 |. 28230 
.34 "37654 | .13307 | .79 |. 29200 | .28524 


.35 .37524 | .13683 | -80 |. 28969 | .28814 
.36 "37391 | .14058 | .81 .28737 | .29103 
-37 "37255 | . 14431 || .82 |. 28504 | .29389 
.38 "37115 | .14803 | .83 |. 28269 | .29673 
.39 "36973 | .15173 | .84 |. 28034 | .29955 


.40 .36827 | .15542 | .85 .27798 | .30234 
.41 :36678 | .15910 | .86 |. 27562 | .20511 
.42 36526 | .16276 || .87 .27324 | .30785 
.43 "36371 | .16640 | .88 |. 27086 | .31057 
.44 "36213 | .17003 | .89 |. 26848 | .31327 


Peete Reet каракашка анны paris 
Z $ 
t2 
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TABLE B.2 (cont.) 


1 
ORDINATES AND AREAS OF THE NORMAL Curve, ¢(z) = a е 72/2 


п 


2 Ф@) |.Л%(2)а2]) z $(г) |Л:хеаг| z $(2) |Jwe(z)dz 
1.35 | .16038 | .41149 || 1.80 | .07895 | .46407 | 2.25 | (03174 | 48778 
1.36 | .15822 | .41309 || 1.81 | .07754 | .46485 |] 2.26 | .03103 | .48809 
1.37 | .15608 | .41466 | 1.82 | .07614 | 146562 | 2.27 | .03034 | 48840 
1.38 | .15395 | .41621 | 1.83 | .07477 | .46638 | 2.28 | ‘02965 | .48870 
1.39 | .15183 | .41774 || 1.84 | .07341 | .46712 | 2.29 | .0?898 | .48899 
1.40 | .14973 | .41924 | 1.85 | .07206 | .46784 || 2.30 | .02833 | .48928 
1.41 | .14764 | :42073 | 1.86 | .07074 | .46856 || 2.31 | 02768 | 48956 
1.42 | .14556 | .42220 | 1.87 | .06943 | .46926 | 2.32 | .02705 | .48983 
1.43 | .14350 | .42364 | 1.88 | .06814 | 146995 | 2.33 | .02643 | `49010 
1.44 | .14146 | .42507 | 1.89 | .06687 | 47062 | 2.34 | `02582 | `49036 
1.45 | .13943 | .42647 | 1.90 | .06562 | .47128 | 2.35 | .02522 | .49061 
1.46 | .13742 | .42786 | 1.91 | .06439 | .47193 | 2.36 | 102463 | 149086 
1.47 | .13542 | .42922 | 1.92 | .06316 | 147257 | 2.37 | 02406 | ‘49111 
1.48 | .13344 | .43056 | 1.93 | .06195 | .47320 | 2.38 | 102349 | “49134 
1.49 | .13147 | .43189 | 1.94 | .06077 | :47381 | 2.39 | 102294 | “49158 
" 

1.50 | .12952 | .43319 || 1.95 | .05959 | .47441 | 2.40 | .02239 | .49180 
1.51 | .12758 | .43448 | 1.96 | .05844 | .47500 | 2.41 | 02186 | 49202 
1.52 | .12566 | .43574 | 1.97 | .05730 | :47558 | 2.42 | 02134 | 149994 
1.53 | .12376 | .43699 | 1.98 | 105618 | .47615 | 2.43 | :02083 .49245 
1.54 | .12188 | .43822 | 1.99 | .05508 | :47670 || 2.44 | :02033 | 149968 
1.55 | .12001 | .43943 | 2.00 | .05399 | .47725 | 2.45 | .01984 .49286 
1.56 | .11816 | .44062 | 2.01 | .05292 | :47778 | 2.46 | 101936 .49305 
1.57 | .11632 | .44179 | 2.02 | .05186 | :47831| 2.47 | `01830 | 149394 
1.58 | .11450 | .44295 | 2.03 | 105082 | :47882 | 2.48 | `01842 | 149343 
1.59 | .11270 | .44408 | 2.04 | 104980 | :47932 | 2.49 | ‘01707 | [49361 
1.60 | .11092 | .44520 || 2.05 | .04879 | .47982 | 2.50 | (01753 .49379 
1.61 | .10915 | .44630 | 2.06 | .04780 | :48030 | 2.51 | ‘01709 | 49396 
1.62 | .10741 | .44738 | 2.07 | .04682 | :48077 | 2.52 | `01667 .49413 
1.63 | .10567 | .44845 || 2.08 | .04586 | :48124 | 2.53 | `016о5 | 49430 
1.64 | .10396 | .44950 | 2.09 | 104491 | `48169 || 2:54 .01585 | .49446 
1.65 | .10226 | .45053 | 2.10 | .04398 | .48214 | 2.55 .01545 | .49461 
1.66 | .10059 | .45154 | 2.11 | 104307 | .48257 | 2:56 .01506 | .49477 
1.67 | .09893 | .45254 | 2.12 | .04217 | 48300 | 2.57 | :01468 .49492 
1.68 | .09728 | .45352 | 2.13 | .04128 | :48341 | 2:58 | :01431 .49506 
1.69 | .09566 | .45449 | 2.14 | .04041 | :48382 | 2.59 | :01394 .49520 
1.70 | .09405 | .45543 | 2.15 | .03955 | .48422 || 2.60 .01358 | .49534 
1.71 | .09246 | .45637 | 2.16 | .03871 | 48461 | 2.61 .01323 | .49547 
1.72 | .09089 | .45728 | 2.17 | .03788 | :48500 | 2.62 | `01289 .49560 
1.73 | .08933 | .45818 | 2.18 | .03706 | :48537 | 2:63 .01256 | .49573 
1.74 | .08780 | .45907 | 2.19 | .03626 | :48574 | 2.64 | 01223 .49585 
1.75 | .08628 | .45994 | 2.20 | .03547 | . 48610 | 2.65 | „01191 .49598 
1.76 | .08478 | .46080 | 2.21 | .03470 | 48645 | 2.66 | :01160 .49609 
1.77 | .08329 | .46164 | 2.22 | .03394 | .48679 | 2.67 | 01130 | `40621 
1.78 | .08183 | .46246 | 2.23 | .03319 | .48713 | 2.68 | 01100 | 149632 
1.79 | .08038 | .46327 || 2.24 | .03246 | .48745 | 2.69 | .01071 | 49643 
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ТАВІЕ B.2 (cont.) 


= e-212 


ORDINATES AND AREAS OF THE NoRMAL CURVE, $(2) = US 


cocococses 000202090 0000020000 NNNNN NONNO NNNNN ююююю ююююю ююююю 


= | оо лә) = | 90 леда ғ | «o pee 
5 то | 49918 || 3.60 | .00061 | .49984 
ЧЕЧЕ ЕЕЕ 
72 | 00987 | (49674 | 3.17 | .00262 | -49925 I| 3:63 | | s 
73 | 00961 | 149683 | 3.18 | -00254 | .49926 I os Eo 
74 | 100935 | 149693 | 3.19 | .00246 | .49929 | 3-64 | - Е 
.00051 | .49987 
75 | .00909 | .49702 | 3.20 | -00238 i PE -000a | 49087 
76 | .00885 | .49711 | 3.21 | -00231 |. 00039 | 149988 
" 100861 | .49720 3.22 -00224 b a foots | 49988 
S о! ^ * P4 . d 
19 (ro p я “90210 | .49940 | 3.69 | .00044 | .49989 
7 .49989 
80 | .oo792 | .49744 | 3.25 | -00203 | 49952 | 3-70 -00077 | 149990 
81 | 00770 | [49752 || 3.26 | -00196 | -49944 || 3-75 | 00039 | .49990 
82 | -00770 | -$0780 || 3.27 | 00190 | 49946 || 3-72 | ‘обоза | .49990 
83 | 100127 | 49767 || 3.28 | -00184 | -49948 | 3-7% | 100037 | .49991 
84 | “oovoy | 49774 | 3.29 | -00178 | -49950 |j 3.15 | - = 
85 | (00687 | .49781 || 3.30 | -00172 -49958 8.15 d AE 
86 | 00087 | 'doras | 3.31 | .00107 | -49953 | 3-77 | 00083 | .49992 
87 | 100008 | ‘49705 | 3.32 | -00161 | -49955 | 3-78 | -0081 | 40992 
sg | 00049 | 49801 | 3.33 | 00156 | -49957 || 3-70 | 00080 | 49992 
89 | ‘0813 | 149807 | 3.34 | -00151 | -49958 |*- An ee 
3.80 
90 | 00505 | .49813 | 3.35 | 00140 | -49980 | 2-51 | 00028 | 49998 
91 | ‘00558 | 49819 | 3.36 | -00141 | -49083 | 3:g2 | 00027 | .49993 
9з | 00578 | -49819 | 3.37 | -00136 | -49962 | 3-53 | -00026 | -49994 
93 | 00502 | 49831 || 3.38 | -00132 | -49965 | 3.84 | .00025 | .49994 
94 | 100530 | 149836 | 3.39 | 00127 | -4 ae 
3.8 : 
о |.00123 | -49986 00023 | .49004 
ЕЕ 
7 51 | 3.42 | - .49969 | 3. 
By | 00455 -40861 | 3-48 | 00111 | 49920 3-88 | -00021 | :49995 
99 | 00457 | [49861 | 3.44 | -00107 | -* рн atum 
49972 | 3- 19 | 149995 
00 3.45 | .00104 | - 3.91 | .00019 |. 
-01 -00430 po 3.46 | .00100 n 3,92 | 00018 | .49996 
-02 | 00417 | .49874 | 3.47 .00097 “49075 3:93 | .00018 | .49996 
:08 | 00405 | -49878 | 3-45 0 ФОТ | 3.94 | :00017 | 499% 
. 100393 | .49882 | 3.49 |. : pen 
3.95 | .00016 |. 
00087 | -49977 00016 | 49906 
08 on pO $5 :00084 | -49978 297 00015 | .49996 
Q6 | .00370 | -4 c51 | -00084 | 149018 | 3:97 | бо | -40097 
7 | .00358 | .49893 | 3- 00079 | -49979 | 3- 00014 | 49907 
8 -00348 | -49807 ari 00076 | .49980 | 3-99 
-09 | 100337 | .499 154 |. 
.00073 | -49981 
ЕНЕ ДЕ: 
119 | 100307 | .49910 | 3.57 | -0006$ | "49983 
113 | 100298 | 49913 | 3.58 | -00065 | "49983 
-14 | 100288 | .49916 | 3.59 Бывае ЫШ с=с == 
Puoi Бена е 
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TaBLE B.3 


VALUES OF x? CORRESPONDING TO GIVEN PROBABILITIES* 


Degrees 
of 
freedom 

n 01 .02 05 

1 6.635 | 5.412] 3.841] 2 
2 9.210 | 7.824 | 5.991] 4 
3 11.341 | 9.837 | 7.815] 6 
4 13.277 | 11.668 | 9.488 |7 
5 15.086 | 13.388 | 11.070 | 9 
6 16.812 | 15.033 | 12.592 | 10 
7 18.475 | 16.622 | 14.067 12 
8 20.090 | 18.168 | 15.507 | 13 
9 21.666 | 19.679 | 16.919 | 14 
10 23.209 | 21.161 | 18.307 | 15 
11 24.725 | 22.618 | 19.675 | 17 
12 26.217 | 24.054 | 21.026 | 18 
13 27.688 | 25.472 | 22.362 | 19 
14 29.141 | 26.873 | 23.685 | 21 
15 30.578 | 28.259 | 24.996 | 22 
16 32.000 | 29.633 | 26.296 | 23 
17 33.409 | 30.995 | 27.587 | 21 
18 34.805 | 32.346 | 23.869 | 25 
19 36.191 | 33.687 | 30.144 | 27 
20 37.566 | 35.020 | 31.410 | 23 
21 38.932 | 36.343 | 32.671 29 
22 40.289 | 37.659 | 33.924 | 30 
23 41.638 | 38.968 | 35.172 | 32 
24 42.980 | 40.270 | 36.415 | 33 
25 44.314 | 41.566 | 37.652 | 34 
26 45.642 | 42.856 | 38.885 | 35 
27 46.963 | 44.140 | 40.113 | 36 
28 48.278 | 45.419 | 41.337 | 37 
29 49.588 | 46.693 | 42.557 | 30 
30 50.892 | 47.962 | 43.773 | 40 


For larger values of л, the quantity (2х2)? — (2n — 1)!/? may be used as a normal 


deviate with unit standard deviation. 


*This table is reproduced from “Statistical Methods for Research Workers," with the 


generous permission of the author, Sir Ronald A. 
and Boyd. 


Fisher, and the publishers, Messrs. Oliver 
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TABLE B.3 (cont. ) 
VALUES OF x? CORRESPONDING TO GIVEN PROBABILITIES* 


———————————————————————— 


de Probability of a deviation greater than x* 
о 
Íreedom 
n .70 .80 .90 .95 .98 .99 
1 . 143 .0642 .0158 .00393 .000628 .000157 
2 ‚713 .446 211 .103 .0404 .0201 
3 1.424 | 1.005 .584 .852 .185 .115 
4 2.195 | 1.649 1.064 711 .429 .297 
5 3.000 | 2.343 | 1.610 | 1.145 .752 .554 
$ 3.828 | 3.070 | 2.204 | 1.635 1.134 .872 
7 4.671 | 3.822 | 2.833 | 2.167 1.564 1.239 
8 5.597 | 4.594 | 3.490 | 2.733 2.032 1.616 
9 6.393 | 5.380 | 4.168 | 3.325 2.532 2.088 
10 7.267 | 6.179 | 4.865 | 3.940 | 3.059 2.558 
11 8.148 | 6.989 | 5.578 | 4.575 p a. 
12 9.034 | 7.807 | 6.304 | 5.226 em л 
13 9.926 | 8.634 | 7.042 | 5.892 4.7 р Us 
14 | 10.821 | 9.467 | 7.790 | 6.571 eee p 
15 | 11.721 | 10.307 | 8.547 | 7.261 5.9 .22 
16 | 12.624 | 11.152 | 9.312 | 7.962 sion ge 
17 | 13.531 | 12.002 | 10.085 | 8.672 7.255 epe 
18 | 14.440 | 12.857 | 10.865 | 9.390 г 
0.117 8.567 7.633 
19 | 15.352 | 13.716 | 11.651 | 10. Sur Los 
20 | 16.266 | 14.578 | 12.443 | 10.851 н . 
21 | 17.182 | 15.445 | 13.240 | 11.591 MM 2 
22 18.101 | 16.314 | 14.041 | 12.338 Lom 16:100 
23 | 19.021 | 17.187 | 14-848 13:091 11.992 10.856 
24 | 19 943 | 18.062 | 15.650 | 13.8 cpm 11.524 
25 | 20.867 | 18.940 | 16.473 | 14-61 4 = 
12. 
26 |21.792 | 19.820 | 17.292 15.879 rr eS 
27 22.719 | 20.703 | 18.12 pee 14.847 13.565 
28 23.647 | 21.588 | 18.939 Dues 15.574 | 14.256 
29 | 24 577 | 22.475 | 19.768 | 17-7 | 16306 | 14.953 
30 | 25.508 | 23.364 | 20.599 Ren a 
К cdi Miet" * 


1/2 may be used as a normal deviate 


* 
Fo 21/2 — =A 
With т (агре values of n, the quantity (2х5) — ©" à; 
it standard deviation. 
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TABLE B.4 
VALUES OF ! CORRESPONDING TO GIVEN PROBABILITIES* 


———-—-—-—-—-—-—-—-———:  - 


Degrees Probability of а deviation greater than £ 
of [pe——————MÁ A E E Ea 
freedom .005 Ol 


63.657 31.821 


1 

2 9.925 6.965 

3 5.841 4.541 

4 4.604 3.747 

5 4.032 3.365 

6 3.707 3.143 

7 3.499 2.998 

8 3.355 2.896 

9 3.250 2.821 

10 3.169 2.764 

11 3.106 2.718 1. 
12 3.055 2.681 1; 
13 3.012 2.650 1. 
14 2.977 2.624 1. 
15 2.947 2.602 R 
16 2.921 2.583 Y: 
17 2.898 2.567 1. 
18 2.878 2.552 L 
19 2.861 2.539 Í; 
20 2.845 2.528 1. 
21 2.831 2.518 1: 
22 2.819 2.508 1: 
23 2.807 2.500 1. 
24 2.797 2.492 1: 
25 2.787 2.485 1. 
26 2.779 2.479 1. 
27 2.771 2.473 1: 
28 2.763 2.467 1. 
29 2.756 2.462 1. 
30 2.750 2.457 $. 
eo 2.576 2.326 1 


The probability of a deviation numerically greater than / is twice the probability 
given at the head of the table. 


"This table is reproduced from "Statistical Methods for Research Workers," with the 
generous permission of the author, Sir Ronald A. Fisher, and the publishers, Messrs. Oliver 
and Boyd. 
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TABLE B.4 (cont.) 
VALUES OF / CORRESPONDING TO GIVEN PROBABILITIES* 
ae 
Degrees Probability of a deviation greater than t 
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Ч of the table. 


TABLE В.5 
5% (ROMAN ТУРЕ) AND 1% (BOLDFACE ТУРЕ) POINTS IN THE DISTRIBUTION OF F* 
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f Dr. С. W. Snedecor and the Iowa State Univ. Press. 


155101 О! 


*Reproduced from Statistical Methods (Sth ed., 1956), by kind perm 
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TABLE B.6 
CRITICAL VALUES FOR THE Korwoconov Test 
Values of Dy such that P[max|S x(x) — F(x)| > Dy] = ә 


« 
N 
0.20 0.10 0.05 0.01 
5 0.446 0.510 0.565 0.669 
6 0.410 0.470 0.521 0.618 
7 0.381 0.438 0.486 0.577 
8 0.358 0.411 0.457 0.543 
9 0.339 0.388 0.432 0.514 
10 0.322 0.368 0.410 0.490 
11 0.307 0.352 0.391 0.468 
12 0.295 0.338 0.375 0.450 
13 0.284 0.325 0.361 0.433 
14 0.274 0.314 0.349 0.418 
15 0.266 0.304 0.338 0.404 
16 0.258 0.295 0.328 0.392 
17 0.250 0.286 0.318 0.381 
18 0.244 0.278 0.309 0.371 
19 0.237 0.272 0.301 0.363 
20 0.231 0.264 0.294 0.356 
25 0.21 0.24 0.27 0.32 
30 0.19 0.22 0.24 0.29 
35 0.18 0.21 0.23 0.27 
>35 1.07N- 12 1.22N-1/2 1.36N-1/2 1.63N-1/2 


А Adapted from Massey, Е. J., Jr., "The Kolmogorov-Smirnov Test for Goodness of 
Fit,” J. Amer. Stat. Assoc., 46, 1951, р. 70, with the kind Permission of the author and 
the publisher. 
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TABLES 
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©865`0 | 81070 | ZLCCO | 150170 | 7850'0 | 90100 | 12000 | 0000 91 
0005`0 | 9£0£'0 | 60$1`0 | ©6$0`0 | 9L10'0 | ££00'0 | ©0000 SI 
1709`0 | Є<6Є`0 | OCIC'O | 8680'0 | [850'0 | $9000 | 6000'0 | 100070 vl 
000S°0 | $0670 | #ЄЄ1`0 | 19#0`0 | CIIOO | LIOO'O | 10000 tT 
8c19'0 | CL8€'0 | 8Є61`0 | 0£00 | €610'0 | zeoo'o | ©0000 ZI 
000S°0 | 77СО | 110 | LC£0'0 | 9S00'0 | $0000 | 
O£c9'0 | OLLE'O | 6ILUO | [600 | LOIO'O | 01000 01 
000S°0 | 6£SC'O | 8680°0 | $610°0 | 0z00°0 6 
19970 | ££9€'0 | $УРГО | CS£0'0 | 6500`0 8 
00050 | 99с'0 | $z90°0 | 81000 L 
79S9°0 | 8Е7Е'0 | 760ТО | 95100 9 
0005°0 | $/81'0 | CI£00 S 
TI II 01 6 8 [^ 9 © v t [^ I 0 


S'O = 0 Чим Sjen} juopuodopur N Ur sassooons 1949] 10 4 JO Азаваола 
SalLIIIGVHOWd IVINONIG JALLVIAWND 


L'H апау 
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TABLE B.8 
SIGNIFICANCE ТЕЗТ FOR THE MEDIAN (THE WALSH TEST) AT STATED SIGNIFICANCE 
LEvELs* 
N а Either Or 

5 | 0.125 | Kd. + ds) <0 (di + а) >0 
0.062 ds <0 di0 

6 | 0.094 max[ds, 304: + аву] < 0 min[d2, 3(d: + d3)] > 0 
0.062 #(45 + ds) < 0 ¿(dı + d) — 0 
0.031 ds<0 a>0 

7 | 0.109 max([ds, (d. + 4:)] < 0 min[ds, (di + di] > 0 
0.047 max([de, 3(ds + 4)) < 0 min[d2, (dı + аз)] > 0 
0.031 209 + а) < 0 Adi + d) > 0 
0.016 4% <0 а> 0 

8 | 0.086 | тах[а, (ds + 4)] < 0 тіп[аз, +(4\ + ds)] > 0 
0.055 | max[de, 3(45 + d;)] < 0 min[ds, 3(d: + d4)] > 0 
0.023 max[d7, 3(ds + ds)) < 0 min[do, 4(4 + 43)] > 0 
0.016 Md: + de) < 0 (di + d?) > 0 
0.008 da < 0 di—0 

9 | 0.102 | max[ds, 3d; + doy] < 0 min[ds, 4(di + аў] > 0 
0.043 | max[d;, 3(4 + dy] < 0 min[ds, (4 + ds)] > 0 
0.020 | max[ds, (ds + азу] < 0 min[de, 3d: + 45)] > 0 
0.012 | max[ds, (d; + dj] < 0 min[de, 3 (4 + d;)] > 0 
0.008 4(da + do) < 0 


4(di + do) > 0 


For continuation of Table B. 8 see next page 


“Adapted from Walsh, John E., "Applications of Some Signi the Median | 
Which Are Valid Under Vers Gate Pp! $ of Some Significance Tests for the Me 


/ e Jnc ral Conditions,” J. Amer. Stat, Assoc., 44, 1949, p. 343, 
with the kind permission of the author and the publisher. 
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TABLE B.8 (continued) 


N a Either Or 
10 0.111 | max[de, 4(ds + d19)] < 0 min[ds, 3(d1 + d7)] > 0 
0.051 | max[dz, 3(45 + dio)] < 0 тіп[а, 3 (41 + dc)] > 0 
0.021 | max[ds, 4(ds + d1o)] < 0 min[ds, 3 (4 + ds)] > 0 
0.010 | max[do, 3(4 + d10)] < 0 min[ds, 3(d1 + ds)] > 0 
11 0.097 | тах[а%, (4 + d11)) < 0 min[ds, 3(d1 + ds)] > 0 
0.056 | max[d:, Mds + 4:1] < 0 min[ds, 3 (41 + d7)] > 0 
0.021 | max[3(de + а), 4(ds + d9)] < 0 min[3(di + de), 3(аз + d:)] > 0 
0.011 | max[do, 204 + 4:1) < 0 min[ds, 3(d1 + ds)] > 0 
12 | 0,094 | max[4(ds + dis), Mds + dis)] < 0 | min[4(di + do), (de + &)] > 0 
0.048 | max[ds, 3(ds + diz)] < 0 min[ds, 4(di + ав)] > 0 
0.020 | max[do, 4(ds + diz)] < 0 min[ds, (di + d:)] > 0 
0.011 | max[3(d + 4+2), 4(do + dio)] < 0 | min[4(di + de), 4(ds + d4)] > 0 
13 0.094 | max[A(da + dis), (ds + 4)] < 0 min[4(di + dio), 3(4 + а)) > 0 
0.047 | max[4(ds + dis), 404 + diz)) <0 | тіп + do), 304 + ds)) > 0 
0.020 | max[4(ds + dis), (do + dio)) < 0 | min[B(di + ds), (ds + &)] > 0 
0.010 | max[dio, 209% + dis)] < 0 min[ds, 3(d: + d:)] > 0 
14 0.094 | max[4(da + dis), #045 + dis)] < 0 | min[$(di + di), 4 + d)) > 0 
0.047 | max[4(ds + dis), 34 + 4:3] <O | min[A(di + dio), 4(d2 + do)] > 0 
0.020 | max[dio, 4(4 + d:4)] < 0 min([ds, 304 + ds)] > 0 
0.010 | max[H(d; + dia), (dio + 1) <0 | пе + ds), 3 + &)] > 0 
15 | 0.094 | паж (аи + dis), 04 + 49] <0 | тіп + dis), (ds + di] > 0 
0.047 | max[S(ds + dis), 408 + dis] <0 | п + dis), ds + dio) > 0 
0.020 | паж (ав + dis), Зо + 41) < 0 | тіпа + do), 046 + do) > 0 
0.010 | max[dii, (d: + dis)] <0 miníds, 200 + &)] > 0 


Reject Но[м = 0] against two-tailed alt: 
if either of the corresponding alternatives is 
use column 3 with significa 


column 4 with significarice level «[2. 


ernative Hılu # 0] at significance level а 
true. For one-tailed test against Hily < 0] 
nce level «/2. For one-tailed test against Ним > 0] use 
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TABLE B.9 
CRITICAL VALUES OF U FOR THE MANN-WHITNEY TEST 


(a) « = 0.01 (one-tailed test) 


Ne 


1 

4 

T 

9 Д1 12 13 15 16 18 19 20 22 
9 п 12 14 16 17 19 21 23 24 26 28 

15 17 20 29 3» 26 28 30 32 34 

18 


21 23 26 28 31 33 36 38 40 
10 16 19 22 24 27 30 33 36 38 41 44 47 
11 18 22 25 28 31 34 37 41 44 47 50 53 
12 21 24 28 31 35 38 42 46 49 53 56 60 
13 23 27 31 35 39 43 47 51 55 59 63 67 
14 26 30 34 38 43 47 51 56 60 65 
15 28 33 37 42 47 s 56 61 6 70 75 80 


(b) « — 0.05 (one-tailed test) 
ке ЗЫ 


Na 
м; 9 10 Y) е үз и 15 16 17 18 19 20 


3 3 4 5 5 6 7 7 8 9 9 10 и 
4 6 7 8 9 10 1 2 44 15 16 17 18 
5 9 Il 12 13 15 16 18 19 20 22 23 25 
6 12 14 16 17 19 21 23 25 26 28 30 32 
7 15 17 19 21 24 26 28 30 33 35 37 39 
8 18 20 23 26 28 31 33 36 39 41 


44 47 
9 21 24 27 30 33 36 39 42 45 48 51 54 
10 24 27 31 34 37 4l 44 48 51 55 58 62 
11 7 31 34 38 42 46 50 54 57 61 65 69 


12 30 34 38 42 47 s 55 60 64 63 72 77 
13 33 37 42 47 51 56 61 65 70 75 80 84 
14 36 41 46 51 56 61 66 7 77 


82 87 92 
15 39 44 50 55 q 66 72 7 83 88 94 100 
‚ 


Abridged from Auble, D., "Extended Tables for the Mann 


-Whitney Statistic," 
Bulletin of the Institute of Educational Research, Indiana Univer: 


Sity, 1, no. 2, 1953, 


with the kind permission of the author and the publisher. 


TABLE В.10. JONCKHEERE’S kK-SAMPLE TEST 


Prob. (S > So) for k samples, each of size r 


k=3 k=3 
So | So 
г =2 #=4 r=3 r=5 
4 0.2889 0.4156 9 0.1940 0.3396 
6 0.1667 0.3609 11 0.1387 0.3025 
8 0.0889 0.3090 13 0.0946 0.2672. 
10 0.0333 0.2602 15 0.0613 0.2340 
12 0.0111 0.2157 17 0.0369 0.2032 
14 0.1756 19 0.0208 0.1748 
16 0.1404 21 0.0107 0.1489 
18 0.1099 23 0.0048 0.1256 
20 0.0844 25 0.0018 0.1049 
22 0.0632 27 0.0006 0.0867 
24 0.0463 29 0.0708 
26 0.0330 31 0.0572 
28 0.0229 33 0.0456 
30 0.0153 35 0.0359 
32 0.0099 39 0.0214 
40 0.0011 43 0.0120 
47 0.0063 
o | dM 
k=4 k=5 k=6 
5 
s RE, r=3 r=4 r=2 r=3 pm 
0.3273 0.2699 
10 0.1302 0.2659 0.3400 0.2110 
2 | oosa | 02220 | 0309 | 01615 | 0 | от 
14 | 0.0484 0.1823 0.2754 | 0.12 2 . 
0.2454 0.0878 0.2274 0.1521 
$ 0:0267 xs 2172 0.0613 0.1982 0.1215 
18 0.0123 0.1166 0.2172 | .1982 } 
0.1910 0.0412 0.1713 0.0953 
20 0.0052 0.0907 
0.1666 0.0265 0.1468 0.0734 
22 0.0016 0.0691 nae om 
24 | 0.0004 | 0.0515 | 0.1443 0.0162 . : 
à 0.1241 0.0094 0.1049 0.0408 
26 0.0374 .12: 
1058 0.0051 0.0874 0.0294 
28 0.0266 0. 
0895 0.0026 0.0721 0.0207 
30 0.0183 0. 
0751 0.0012 0.0588 0.0142 
32 0.0123 0. 
0624 0.0005 0.0475 0.0094 
a 0000 | би | 0.0379 | 0.0061 
26 0.0050 | 00 0.0299 | 0.0038 
0 A А 
38 0.0030 | 0.042 0.0234 | 0.0023 
9 A B 
40 0.0017 0.033 nied 
0.0272 . 
42 0.0137 
0.0215 
44 0.0102 
0.0168 
46 0.0075 
48 0.0130 Оой 
50 0.0100 
R., “А Distribution-Free k-Sample 


Abridged from the tables in 
Test Against Ordered Alternatives, 
of the author and the publishers. 


Jonckheere, A 


" Biometrika, 


A1, 1954, 133-145, by kind permission 


434 INTRODUCTION TO STATISTICAL INFERENCE 


TABLE B.11 
VALUES or Tanh z’ = 


г (FISHER'S TRANSFORMATION) 


*Reproduced from Numerical Tables, by J. W., Ca 
permission of Mrs. Campbell. rapbell 
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TABLE В.11 (cont.) 
VALUES OF Tanh z’ = r (FISHER’S TRANSFORMATION) 
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TABLE B.11 ( cont.) 
VALUES or Tanh z’ =; (FISHER’s TRANSFORMATION) 


рен 
RSs 


FIT 


1 
1. 
1 


x 
` 


88 


уа ра ра а ва 
eo 
848 


Бнын B 
SELER BB 


уа ра а а 
i А 
SmS 


p 
Я £5 
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| ТАВІЕ В.11 (cont.) 
| VaLurs or Tanh z’ = r (FISHER'S TRANSFORMATION) 


$828 


Ё 


28 БЕЗЕБ 


1 
1. 
1. 
1. 
1 
1. 
1 
1. 
1. 
1 
1. 
1 
1 
1 
1. 
1. 
1 
1 
1 
1. 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 


эачая SIRES $2285 P5828 


оным ююююю ыы 
Onus уно on 


ANSWERS TO PROBLEMS 


Chapter 1 $. 3 up. 5. qo 

А. (1) 80, 5; (3) 4. 3.4; (4) 2; (7) 355, 1, 315, ot. 5 

B. (1) 3125, 1024; (2) 120; (3) 360; (4) 480; (8) 20; (9) (4n? — 6n + 4) 
Qn — 2)! 


950\ / (1000 50\ (950 

С. (1) 0.1055; (2) $; (3) 15; (4) ( 5 )/( 5 ) = 0:713, P 3 )/ 
wx. = 0.0423; (5) 0.638; (6) 0.665; (7) 32; (10) 1 У», (- DJ; 
aD xit; C- n (Dco — 5з! = 0.548; (12) e=! = 0.368, 0; (13) 0.096, 
0.497, 0.407; (14) +5; (15) +2. 

D. (1) 37.5; (2) 81.44; (3) (1/2)**1, 1; (4) (1 — ру/р; (5) 13; (6) 7n[2; 
(7) $2.67; (8) $6; (9) (a) 2 (b) 43 (c) 223 (d) 103 (e) 365 (f) 264. 

Е. (1) 4; (2) 0.326; (3) 1; (5) 1; (6) 27/3. 


Chapter 2 


А. (3) 0.683; (4) М = 7.26%, О, = 6.39%, О, = 8.22%, Р.В. = 79.6; 
(5) 19.5d., 7.8d., 74.3; (6) 32, 5.2; (8) —0.104, 1.861, 0.111, x = 7.35, ky = 1.86, 
m3/m,3!? = 0.27. 

В. (1) reo, 2.16, 0.88, —0.14; (2) 8, 2.184; (3) 16/7, 3, & (4 4, 3, 3, 
—3log(l — A); (5) (et — 172/12; (6) Nlog[l + p(e^ — Ul Np, Npq, Npq 
(4 — p), №а(1 — бра); (10) of/(x — 1), «f*/((x — 1)2 (о — 2); (11) (Бут, 
(а — ПИБа + 1)}]''°; (12) 2; (13) = (1 + X). 


С. (1) 4, $; (2) 4; (3) (а) 0, (b) e^? = 0.05, Chebyshev inequalities (a) 
P < зв, (bd) P<. 


Chapter 3 

А. (D) 0.774; (2) (a) НЗ, (b) $; (4) 0.063; (5) 0.468; (6) n > 69; (8) C(n,, 
п) = —пб(1 — 0), V(n[n — n/n) = 40(1 — 0)/n. 

B. (1 45; (2) М > 6250; (5) E(X) = $, ИХ) = 5%. 

С. (02e? = 0.271; (2) 2.303; (3) 0.423; (4) Р(11,5) = 0.0137; (5) 0.0915, 
0.216; (6) (а) 0.143, (b) 0.053, (c) 6 or more. 

D. (1)0.0863, 0.3251, 0.9808, 0.0515; (2) (a) 1.1505, (b) —0.1764; (3) 0.586; 
(4) (а) 75.8, (b) 1037; (5) 20; (6) 793, 4207; (7) 74.3, 3.23; (8) 125; (10) 0.219, 
0.200, 0.214; (11) 0.217, 0.625. 


E. (3) ИХ) = 142, 143, 44.7, 47.5, 102.0, V(A) 


= 9.73, 11.38, 24.57, 
22.02, 38.82. 
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Chapter 4 

A. (D 4, 4 0, -ii Q) Y= (X — 0/4; (3) e7*, 0 <и < ©; (4) 
S(v)/(2av), v = [(и — Б)/а]!??; (5) g(u) 22u^!?/9 (0 < и € 1), gu) = (1 +u 1/2) 9 
(1x u « 4); (6) (а) Flog x), 0 <х < œ, (Б) У -„[Е@т + sin^!x) — 
F(Qn— Dzx-—sin^'x)] -1 < x < 1, (0) х, 0< x< 1. 

В. (3) (а) (2) T G), (b) e* Г(8)/(3-67); (4) [BG, (N — 3)2] 7! :(9) (m — 1)/n. 

C. (5) EX) = e!?, ИХ) = е(е — 1); (6) 0.2404. 
Chapter 5 

A. (1) ty = 0.0510, t, = 1,5530, 90% limits 0.026 to 0.784; (2) t, = 0.736, 
t, = 1.264, 95% limits 0.936 to 1.464; (3) X + 2.330о№` 12; (8) 5 to 25.3, com- 


pared with 5 to 39.2 from problem (7); (9) (b) в а jn - бус" 


B. (2) 0.725, 0.753, 0.613, — 0.046; (3) 0.061, 0.074, C(K,, k2) = 0.0031; 
(4) 0.046, 0.091. 

С. (1) No, P(|X — 20| > 1.8) = 0.027; (2) 12.28 to 12.38 sec, 17; (3) Yes, 
Р(|, = xi| > 5) = 0.009; (4) 0.08 to 0.17, 0.087 to 0.175; (5) No, P — 0.17; 
(6) 0.38 to 0.78; (7) 0.06 to 0.27; (8) hypothesis not rejected, P = 0.11; (9) 62 
ог more. 


Chapter 6 
А. (8) W)/V(d Упр) = 0.876; (11) 0 = Xm — 1)/m; (12) МА - 0) – 
AY x? 2 1 + 22) Уху = 0. 

Е р и. M ifm» 75 + 2.79N-"/?, 0.14, 0.57, 0.92, 23; 
(3) reject H, if т> К, where k is given by P(10k, 10) < 0.05, К = 1.6, 
power = P(16,20) = 0.84; (4) x > c, where x © B(c, n, 0o), 1 — В = B(c, n, Ө,), 
n = 50; (5) ||» с t= М! (т — uo)s, 5 = sample standard deviation; 


(9 йу (x — S01 - и - F jx, №? = Y (0n - 2— 


B(x; — x)Y^; (7) 6561. 

€ o у Pp [1*0 рь 141; (8) 0898; (5) «(D = = 900), POZ, 
De 4), 046; (7) accept if fi PC — 0" "0100 40 < jor - 9 
&(0)L.(0) d0. 


Chapter 7 
din OPE E E TER E 06000. 09 IR 
(a) 180%, (b) 216%; (4) 2 — a e P al ‚ хо = 0.644. 
В. 58, 1.54; (4) ИТІМ) = 1-7. Е | 
С i mo E s variance within samples — 161.6, Ит) = 30.66; 
й ) = 11.62, 
(4) 0.406 6 212 242 /С. 
: 6. Ту = 23.7 x 105; (4) 912/6, =02 1C; 
D. ФИФ = 2345 х 10°; CD Pat y Mio ~ 1) + acre 


cadet Mac 2 Муз Mo A E 2.36, N= 487, expected cost = 279, 


f = 302, С' = 319. 

Or pun ps A pics 5 ; (7) Eo(n) = [= x (s А a 
ies BV luo — г Ps x log(, шо), E,(n) = [Id = Вов A + B log Ho 1 
+ Uo log(u,/to)]- 
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Chapter 8 , 

А. (2) 690 or more; (3) 0.308; (4) 62; (6) K(h) = — (1/2) log(1 — 26?hjn), 
Elka) = 07, ИК») = 2o*[n; (T) п/(п + 2). К 

В. (1) 55.2 to 59.3; (2) 179.1 to 182.9; (3) Yes, at 5% level, P = 0.04; 
(4) Yes, P = 0.10; (5) —5.3 to 18.3 mm; (6) —3.2 to 7.4 Ib; (7) No, t= 1.56; 
(8) Yes, P = 0.02; (9) (a) increase barely significant, 1 = 2.1 1, (b) highly sig- 
nificant, t = 5.66; (12) 0.69; (13) 8.4; (15) accept lot if m + 3.365 x 3.0, 
д = 6.20, К = 3.36. 

С. (1) 12.6 to 168; (2) highly significant, F = 23; (3) No, F = 2.0; (4) Yes, 
Е = 2.10; (7) 6.94; (8) 10.1; (9) F = 1.36, 4 = 1.59. 

D. (1) 276, 281; (2) Yes, t < 0.005; (3) Yes, t < 0.005; (4) (а) significant at 
5% level but not at 1%, (b) not at 5%; (5) А = 12.5; ё = 4.06, (7) g(R) = 
(N = De" (1 — e7^)"7*: (8) 0.35 to 4.78; (9) HN — ПКМ + 1, 2P(N — 1)/ 
[(М + DAN + 2)]; (10) 8. 


Chapter 9 

A. (1) homogeneity accepted, M/c = 4.13; (4) highly significant, F = 8.3; 
(5) No, F = 3.33. 

В. (1) variety effect almost sig. at 1% level, F = 3.1 1, block effect sig. at 
5%, F = 2.76; (2) F for fertilizers = 71.4; (3) F(makes) — 34.1, F(cities) = 7.43, 
F(interaction) = 23.2; (4) (a) —8.5, 0.5, 0.5, 7.5, (b) —1.07, —0.80, 1.90, 
—0.04; (5) (a) 0.002, —0.178, 0.140, 0.120, —0.058, 0.008, 0.100, — 0.006, 
—0.128, (b) — 0.086, 0.090, —0.053, —0.020, 0.069; (6) (a) 0.17, — 1.05, 1.07, 
— 1.62, 1.41, (b) 0.37, —0.54, 0.16 (c) 41; = —0.56, 1.62, — 1.05, 2; = —0.84, 
1.24, —0.40, 33; = 0.54, —0.75, 0.22, $4; = —1.24, 1.31, —0.06, 95; = 2.10, 
—3.40, 1.31; (8) à? = 25.9, 6.2 = 37.8, correlation coefficient — 0.59, power 
& 0.28; (9) 6.8, 9.3, 2.6, 4.5. 

C. (1) 0.55, 0.195, 3.37, 0.455, F(makes) — 1.47; (3) city effect highly 
significant, F — 1.56, box effect non-sig., F = 1.2, 6.2 = 0.197, 8g — 0.0005, 
6? = 0.019; (5) row and column effects non-sig., treatment effect highly sig., 
£u. 

D. (1) F(detergents, elim. blocks) — 27, F(blocks, elim. deter: 


both sig. at 1%; (2) 0.307, B and C only; (3) BC, BD and в 
almost sufficient). 


gents) = 3.95, 
С; (4) 5(r = 4 


Chapter 10 
A. (1) P = 0.40; (2) No, P = 0.66; (3) not at 5 


% level, P = 0.02; (4) Yes, 
Р < 0.01; (5) No, P= 0.007; (6) fit satisfactory, 


P=0.14; (7) P= 0.42; 


( P= 0.85; (9) P= 017; 00) (@) 2 179, P — 0003, (by пе 
Р = 0.035, (с) у? = 7.96, Р = 0.046. 
В. (1) Dy = 0.549; (2) not quite Sig. at 5%; (3) Yes, МЇ? p, = 0.86, 


(4) Yes, max |Sy(x) — F(x)| = 0.092. 
C. (1 Yes, = 0.51; (2) No, T = 18.5; (3) No, z x 0.50; (4) null hy- 
pothesis rejected at about 2% level: (5) No, P(one-tailed) = 0.26. 
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D. (1) No; (2) randomness not rejected, P > 0.20; (3) number of runs 
= expected number (33), distribution of runs not random by xX test; (4) = = 3.52, 
highly sig.; (5) Yes, S = 51., P < 0.006. 


Chapter 11 

А. (D (а) >) = Xa – x), OS x«a, №) = 2yla*, Ox y«a, 
ny = (a + x)2, č, = у[2, их = al3, ity = 2a[3, cy? = а?/18, oy? = a7/18, 
oxy = a?/36, p = 1/2; (2) necessary but not sufficient; (9) 1/2. 

B. (1) ye = 0.886x — 0.57, x, = 0.825y + 8.55; (2) 0.855, 0.62 to 1.15, 
0.58 to 1.07; (3) 125, 80, 15.1, 9.05, 75.0, 0.55; (4) у, = 0.070x 4- 58.2, 0.052 
to 0.087, 0.487; (5) No, t — 1.91; (6) 20; (7) No, relative accuracy — 1.19; 
(8) 64, 4.0; (9) 16, 13, 17, 0.60. 

C. (1) у, = 16.58 — 1.27х, 1923; (2) 33.9 in., 24.5 to 43.9 (approximation 
26.7 to 41.1); (3) Й = 3.214, à = 6.726, 6? = 0.00062; (4) (a) y, = 3.224х + 
6.668, (b) ye = 3.214х + 6.723, .(a) 3.199 to 3.263, (b) 3.182 to 3.246, 
6,7 = 0.00071. 

D. (1) Yes, г = —3.43, No, Р = 0.07; (2) 0.319 to 0.774, No (P = 0.11); 
(3) No, P — 0.37; (4) 0.705; (5) р = 1 = 1/(2п)). 

E. (1) rs = 0.796, гк = 0.554; (2) rp = 0.636, rs = 0.733; (3) hypothesis 
of independent random rankings not rejected, P = 0.14; rs = 0.624, ry — 0.422, 
agreement not sig. at 1075 level. 

Е. (1) x? = 76, highly sig., С 
Р = 0.24, (b) P = 0.23; (4) No, P 


= 0.32; (2) Yes, P = 0.01; (3) No, (a) 
я 0.10; (5) Yes, P < 0.01. 


Chapter 12 
A. (3) >. 
(5) у. = — 1.0051 — 0.000065», 
В. (1) rows of A^! are (0.39 
— 0.000167), ( — 0.03651, — 0.000167, 0.00482), ё? 
—0.125 < В, < —0.037, —0.522 < B, < —0.163; (3) а! = 1.134, а 
—0.0000436, аз = 0.0000633, аё = 0.0355, standard errors of b, В», 
b4 = 0.0032, 0.0039, 0.092; (4) rows of A are (20, 98.2, 11,880), (98.2, 506.4, 
57,284), (11,880, 57,284, 7,201,220), 62 = 8.235, И») = c?[8.62 — 1.112x — 
0.01632 + 0.000874xz + 0.0603x? + 0.0000101=?]. | 
C. Фу = 0.778 + 0.557x + 0.1857х2; (2) not at 5% level, F = 4.4 pas 
1 and 4 d.f.; re = 0.9974, r = 0.9854; (4) у, = 196.54 + 2.918x — 0.0698x* + 
0.000299x?; (5) cubic regression not sig., standard errors 1.56, 0.15, 0.0435, 
исз (1) r? = 0481, E? — 0.584, Е = 2.13, linearity acceptable; (2) Ej. = 
2 — 0.271, №, F = 1.51 апа 1.32. 
и: "ib (a) "E 4 x — 15 = 0, (b) у = 1000791; (2) 100y = e = 10°; 
(бу фу yo x0 OBE", x c 0.589е'-058*; (5) а = 0.509, b= -2nie 
(6) a = 200, b — 412, а= 0.733, у. = afl + bg] 5, и = (x — 1870/10; 
(8) a = 0.768, b = 3.86. 


— 3.37x + 0.003642 + 9.30; (4) y, = 17.19 — 0.08 1х — 0.3422; 
+ 0.003894х, + 0.32505х.. 

58, — 0.00377, — 0.03651), (— 0.00377, 0.000286, 
= 2.34; (2) 15.56 < Po < 18.82, 


22 — 
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Chapter 13 

i. (1) 0.67; (2) ros; = 0.759, туз = 0.097, ү» = —0.436, ғ, = 0.802, 
Yes, F = 15.3; (3) ro; = 1.29 (impossible); (4) Гоз = 0.707, т,» = —0.715; 
(5) rox = 0.187, љ = 0.165, гоз = 0.468, r,2 = 0.732, 4 = 0201, 7, = 0.083, 
To1,23 = 0.0035, гоз 1з = 0.095, коз, = 0.454, о,12з = 0.485; (6) weighted 
mean = —0.8975. 

В. (D Ду» ®) = ОИ — p,? ра — py? + Soup] "t 
ехр(– 0/2), О = [x' (1 — p) + УТ — p,2) + 2ху(р,.ру. — p.) + 
2х2(р,уруг — Paz) + 2у2(р,ур,. — р.) — py? — p? — Ру? + 2р,ур,„руг]; 
(4) f(x2|x,) = (2л) 1/2 [Ci (6,02; = С! 22] 2 ехр(— 0/2), О = (Ciox, — 
Сух) 0С, (С.С — 0); (5) (Cy2C33 – Ci3C23)X2 + (C40, — 
Cy2C23)3]/[C22C33 — C237]; T? = 20.5, hypothesis rejected at 197 level. 

C. (1) L = 0.890x, + 3.95x2, or L = 9х, + 40x, approx.; (2) L = -xX 
+ 40x, — 22x3, L, = 1130, L, = 2123, L = 1487; (3)L = —0.0312x, — 0.1839», 
+ 0.2221х; + 0.3147x4, Ly = 0.669, L, = —0.384, criterion L < 0.142 for (1); 
(4) T? = 26.34, highly sig. 
Bi dm [0.4286 0.5714]; (2) (4, 4, Th т), (4, 1, 5, ch), 
G b 15 75), matrix not regular; (3) (а) (0.4, 0.6), (b) (3, 3, 2); (4) Р(ј > ј) = 
2jn — jm, PU > j- 1) = j?]ni, Р(јэј + 1) = (п — jl, 0 « j < n, 
expected number of black balls in (1) = 2; (© 0.55. 
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Acceptance error, 131, 133 
plan, 192 
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Addition law (probability), 7 
Additivity, 214, 240 
Adjoint, 407 
Airey function, 190 
Aitchison, J., 94 
Aitken, A. C., 406 
Allan, F. E., 354 
Analysis of variance, chap. 9 
assumptions, 214, 240 
in regression, 337, 341 
model I, 223, 236 
model II, 225, 227, 232, 245 
model III, 228, 231 
nested model, 232 
one-way classification, 217, 219, 224 
power of test, 227 
two-way classification, 219, 225, 221 
Anderson, R. L., 340, 352, 354 
Anderson, T. W., 380 
Angle brackets, 103, 394 
Angular transformation, 70, 117 
Anscombe, F. J., 73, 77 
Arithmetic mean, see Mean 
Array, 280 
Association (attributes), 313, 315 
Auble, D., 432 
Autocorrelation coefficient, 375 


Bartlett, M. S., 73, 77, 207, 213, 215, 
250, 303, 327, 380 
Bayes, T., 150 
assumption, 144 
criterion, 142, 145 
rule, 144, 149 
Bernoulli, J., 53 
distribution, see Binomial 
law of large numbers, 56, 68 
numbers, 107, 395 
Bertrand, J., 20, 27 
Beta distribution, 83, 205 


Beta function, complete, 84, 94, 389 
incomplete, 85, 196, 245 
Beta-prime distribution, 83, 194, 366 
Betting odds, 9 
Bias (estimator), 128 
Bienaymé theorem, 43 
-Chebyshev inequality, 45 
Binomial distribution, 52, 55 
approximations, 59, 64, 68 
confidence limits, 114, 116, 119 
cumulants, 55 
cumulative, 53, 429 
moments, 54 
recursion formula, 54, 56 
Binomial graph paper, 117 
Binomial sequential test, 166 
Binomial theorem, 13 
Bivariate normal distribution, 148, 282, 
284, 285, 364, 380 
Block effects, 222, 239 
Blocks, complete, 220, 222, 234, 237 
incomplete, 237, 239 
Brandt-Snedecor formula, 317 
Bross, I. D. J., 150 
Brown, J. A. C., 94 
Buffon, G. L. (Comte de), 21, 273 


Camp, B. H., 67 
Campbell, G., 51 
Carter, A. H., 211 
Cauchy distribution, 50, 92, 130 
Cauchy principal value, 50, 385, 386 
Central limit theorem, 90 
Change of variables, 386 
Chapman-Kolmogorov equation, 373 
Characteristic function, 43 
Charlier check, 35 
Gram-Charlier curves, 90 
Chebyshev inequality, 45, 56, 68, 101 
Chi-square distribution, 85, 136, 402 
approximations, 87 


cumulants, 86 
non-central, 136, 244, 347, 372, 397 
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theorems on, 87 
Chi-square test, contingency, 315 
goodness of fit, 253, 255, 275 
of hypothesis, 251 
for contingency tables, 315, 318 
Choleski method (square root method), 
330 
Christian, R. R., 27 
Class boundaries, 29 
interval, 30 
marks, 29, 34 
Clemm, D. S., 213 
Cluster sampling, 152, 155 
Cochran, W. G., 88, 94, 173, 208, 213, 
250, 278, 327 
Coding (of variate), 34, 105 
Cofactor, 406 
Combinations, 12 
Complement (set), 4 
Components of variance, 225, 227 
Conditional probability, 10, 11, 134, | 
397 
expectation, 156, 233, 398 
Confidence belt, 97, 258, 296 | 
coefficient, 96, 97 
limits, 97, 113, 119, 183, 185, 193, 
294, 296, 334 
Conformable matrices, 403 | 
Consistency, 101, 126 
Contingency, 314, 317 
Contrasts, 241 
Convergence (stochastic), 3, 56 | 
Convolution, 47 
Coolidge, J. L., 27 
Cornish, E. A., 183 
Correlation, intra-class, 226 
multiple, 357 
ordinary, 44, 281, 287 
partial, 361, 363, 376 
rank, 308, 310, 313 
serial, 158, 171, 240 
Correlation coefficient 
281, 287, 357 
(Kendall), 309, 313 
(Spearman), 309, 314 | 
distribution of, 306, 308 
significance of, 295, 307, 313 
Correlation index, 352 
Correlation matrix, 357 


(Pearson), 44, 
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distribution of, 345, 347 
Covariance, 44, 226, 280, 377 
Cowden, D. J., 349, 354, 355 
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Cramer, С. (tule), 330, 408 

Cramér, H., 128, 150, 278 
Cramér-Rao inequality, 128 


Cumulant generating function, 41, 43 ; 

Cumulants, 41, 103, 105 
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Distribution function | 

Cumulative frequency, 29, 31 

Curtiss, J. Fi; 51, 77 

Curve-fitting, 32, 90, 335 

goodness of fit, 253, 255 
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Degrees of freedom, 86, 180, 194, 218, 
252, 316 
Delta process, 72 
Density (probability), 19, 46 
Design of experiments, 28, 219, 232 
balanced, 237 
complete, 220, 234, 237 
efficiency factor, 239 
incomplete, 237 
Determinants, 393, 406 
functional, see Jacobian 
Deviation, mean absolute, 147 
Differentiation under integral sign, 392 
Digamma function, 147 
Discriminant function, 366, 369 
Disjoint (sets), 5 
Dispersion, measures of, 33 
Distance (between Populations), 371 
studentized, 372 
Distribution function, 20, 79, 256 
joint, 46, 48, 178 
of sum of variates, 47 
Distribution-free methods, 251—273 


Distributions, Special, see under Separate 
names 


Dixon, W, J., 199, 213 
Domain, 15, 17 
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Double sampling, 158 
Dwyer, P. 5., 354 


e, 381 
Efficient estimator, 101, 126, 206 
Elkin, J. M., 51 
Equations, normal, 286, 329, 330, 410 
Equivalence (events), 5 
Erdélyi, A., 380 
Ergodic chain, 380 
Errors, 286, 292, 299 
curve of, 64 
of first kind, 131, 132, 163 
of second kind, 131, 163 
true, 332 
Estimate, standard error of, 294 
Estimation, 123-147 
components of variance, 225 
contrasts, 241 
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point, 96 
regression coefficients, 297, 303 
treatment effects, 223 
variance from range, 202 
X for given Y, 296, 324 
Estimators, 97, 130 
consistent, 101, 126, 300, 303 
efficient, 101, 126, 130, 206 
invariant, 127 
least squares, 208 


maximum likelihood, 51, 123, 125, 


129, 215, 297 
most efficient, 101, 126 
Sufficient, 125, 127 


unbiased, 101, 105, 127, 153, 215, 


complementary, 4 
compound, 3 
independent, 11 
mutually exclusive, 5 
simple, 3 
Expectation, 17, 102 
conditional, 156, 233, 279, 398 
Exponential distribution, 80, 206, 212 
Exponential function, 382 
Xponential regression, 347 
Extreme values, distribution, 19 
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| Factorial, 12, 388 
moment generating function, 41 
moments, 41, 58 
Stirling approximation, 383 
Factors (design), 232 
F-distribution, 193, 196, 344, 360 
Feller, W., 12, 27, 257, 278 
Fiducial inference, 99 
interval, 96, 120 
Finite population, 101, 117 
| Fisher, Sir Ronald A., 41, 94, 122, 213, 
250, 327, 380 
analysis of variance, 241 
angular transformation, 70, 117 
| approximation to x?, 87 
approximation to г, 183 
discriminant function, 369, 379 
distribution of F, 194, 211 
distribution of r, 306, 360 
distribution of t, 180 
exact test, 319 
extreme values, 207 
fiducial inference, 96 
g-statistics, 197 
inequality, 128 
k-statistics, 103 
maximum likelihood, 123 
theorem (chi-square), 88, 293 
z'-transformation, 307, 361 
Fisher and Yates (tables), 77, 194, 213, 
235, 250, 340 
Fix, E., 150, 397 
Fourier transform, 43 
Fractiles, 33 
Fraser, D. A. S., 28, 51 
Freeman, G. H., 327 
Freeman, M. F., 67, 77 
Frequency curve, 31 
distribution, 28, 29, 289 
polygon, 30 
Frequency, relative, 31, 52 
Function, 15 
characteristic, 43 
indicator, 16, 18, 54 
orthogonal, 87 
Functional relation, 298 


Games, theory of, 147 
Gamma function, complete, 82, 83, 388 


incomplete, 83 
Gamma distribution, 81, 128 
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Gauss, С. F., 64 
Geiger, H., 253 
Geometric distribution, 51 
Gibson, W. M., 327 
Glover, J. W., 14, 27, 54, 77 
Gnedenko, B. V., 94 
Gompertz curve, 354 
Goodness of fit, 253 
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Gram-Charlier system, 90 
approximation, 61 
Grouping, error of, 34, 107, 395 
method of, 303 
of frequencies, see Pooling 
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Gumbel, E. J., 207, 213 
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Hilferty, M. M., 87, 94 
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Homogeneity (of variance), 214, 241 
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Homoscedasticity, 284, 292 
Hotelling, Harold, 327, 346, 354, 380 
generalized T-test, 365, 369 
distribution of r, 306 
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Houseman, E. S., 340, 352, 354 
Hurwitz, W. N., 173 
Hypergeometric distribution, 58 
confluent function, 245, 346 
function, 58, 360 
sampling, 57, 117 
Hypothesis, alternative, 132, 162, 214 
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null, 131, 162, 214, 397 
one-sided, 132 
simple, 132, 162 
test of, 131, 397 
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Improper integrals, 385 
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of variates, 19, 176 
Indicator function, 16, 18, 54 
Inefficient estimator, correction of, 130 
Interaction (events), 5 
(analysis of variance), 220, 222, 227 
237 
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Interval estimation, 96 
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Kelley, T. L., 77 
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test, 256, 258 
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k-statistics, 36, 103, 105, 394 
covariance of, 110 
generalized, 104 
standard errors of, 109 
Kurtosis, 37, 42, 55, 61, 202, 207, 256 
variance of, 197 


Lagrange multiplier, 161, 187, 315, 400 
Laplace, Р. S., 144 
distribution, 50 
Large numbers, law of, 3, 56 
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ms Squares, method, 286, 300, 347, 
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Leibniz formula, 393 
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maximum, see Maximum likelihood 
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Lindley, D. V., 300, 327 
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Logarithmic transformation, 90 
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process, 373, 375 
Massey, F. J., 257, 259, 278 
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Mathematical model, 217, 220, 235, 
238 
Matrix algebra, 402-412 
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adjoint of, 407 
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inverse of, 330, 407, 408 
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null (zero), 403 
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rank of, 407 
regular, 374 
singular, 364, 407 
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square, 402 
symmetric, 329, 405 
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estimators, 51, 123, 129, 215, 297, 
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May, M. A., 363 
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confidence limits, 113 
distribution of, 111, 175 
experiment on, 112 
standard error of, 109 
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Means, difference of, 121, 165, 216 
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Measure, 6, 7, 12, 16 
Median, 32, 50, 204 
distribution of, 204, 205 
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Merrington, M., 197, 213, 216, 250 
Mid-range, 119, 212 
Miller, L. H., 257, 278 
Minimax principle, 142, 146 
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Models (A. of V.), 223, 225, 228 
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Moment generating function, 40 
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Moore, G. H., 268, 278 
Moses, L. E., 278 
Moshman, J., 94 
Most efficient estimators, 101 
Mosteller, F., 117, 122 
Multinomial distribution, 252, 364, 401 
theorem, 400 
Multiple correlation, 357, 360 
regression, 328, 356 
Multiplication law (probability), 11 
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315, 400 
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normal distribution, 363, 377 


Negative binomial distribution, 93, 120 
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Non-central distributions, 
chi-square, 136, 244, 347, 372, 397 
F, 366, 372 
t, 190, 192 
Non-normality, effect of, 207, 240, 245, 
251 
tests for, 208, 256 
Non-parametric tests, 251—273 
Normal correlation surface, 284 
Normal distribution, 64, 66, 174-208 
approximation by, 64, 66, 68, 321 
cumulants of, 69 
standard form, 66, 68, 70, 77, 391 
Normal equations, 286, 329, 330, 410 
Normal process, 375 
Normal test, one-sided, 136 
two-sided, 138 
Normality, assumption of, 174, 207, 
240, 245, 251 
Nuisance parameters, 125 
Null hypothesis, 131 
Null set, 5 
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Pearson, E. S., 27, 77, 94, 122, 150, 
198, 201, 213, 250 
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Permutations, 12 
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Plackett, R. L., 355, 380 
Point estimation, 96 
Poisson distribution, 59, 61, 77, 83, 93 
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approximation by, 61 
confidence limits, 98 
cumulants, 61 
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Poisson sampling scheme, 56 
Polynomials, fitting of, 335 
orthogonal, 337 
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Population, finite, 102, 117 
mean, 37 
relation to sample, 37, 95-119 
Power, of test, 134, 168 
chi-square test, 254, 397 
F-test, 196, 227, 244, 245 
Kolmogorov test, 258 
sign test, 262 
Mann-Whitney test, 267 
run tests, 270 
t-test, 189 
Walsh test, 264 
Wilcoxon test, 264 
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Predicted value, variance of, 331 
Predictors, 328, 356 
Probability, 1-23 
addition law, 7 
conditional, 10, 11, 134, 279, 397 
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distribution, 37, 77, 78-92 
graph paper, 70, 77 
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joint, 279 
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Probability transformation, 79 
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166, 191 
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Randomized decision, 139 
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tests for, 267 
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as estimator of variance, 202 
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distribution of, 120, 200, 203, 212 
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Ranges, quotient of, 204 
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Rao, C. R., 128 
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Regression, 279-322, 328-350 
and maximum likelihood, 330 
curvilinear, 279, 335 
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linear, 279, 281 
multiple, 328, 356 
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plane, 356 
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confidence limits, 294 
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Seidel, P. L., 349 

Sequential sampling, 131, 163, 165, 373 
test, 165 

Serial correlation, 158, 171, 240 

Sets, finite, 4 

Sheppard's corrections, 107, 395 

Siegel, S., 278 

Sign test, 260, 414 
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Smirnov, N., 259, 260, 278 


450 INDEX 


Smith, B. Babington, 27 
Smith, K., 278 
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